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Preface 



Since 1998, RAID has established its reputation as the main event in research 
on intrusion detection, both in Europe and the United States. Every year, RAID 
gathers researchers, security vendors and security practitioners to listen to the 
most recent research results in the area as well as experiments and deployment 
issues. 

This year, RAID has grown one step further to establish itself as a well-known 
event in the security community, with the publication of hardcopy proceedings. 
RAID 2000 received 26 paper submissions from 10 countries and 3 continents. 
The program committee selected 14 papers for publication and examined 6 of 
them for presentation. In addition RAID 2000 received 30 extended abstracts 
proposals; 15 of these extended abstracts were accepted for presentation. Ex- 
tended abstracts are available on the website of the RAID symposium series, 
http://www.raid-symposimn.org/. We would like to thank the technical pro- 
gram committee for the help we received in reviewing the papers, as well as all 
the authors for their participation and submissions, even for those rejected. 

As in previous RAID symposiums, the program alternates between funda- 
mental research issues, such as new technologies for intrusion detection, and 
more practical issues linked to the deployment and operation of intrusion detec- 
tion systems in a real environment. Five sessions have been devoted to intrusion 
detection technology, including modeling, data mining and advanced techniques. 
Four sessions have been devoted to topics surrounding intrusion detection, such 
as evaluation, standardization and legal issues, logging and analysis of intrusion 
detection information. RAID will also host two panels, one on practical deploy- 
ment of intrusion-detection systems where users of the technology will share 
their experience with the audience and one on the distributed denial of service 
attacks that generated a lot of attention in early 2000. 

In summary, we hope that this very dense program and mix of practical and 
theoretical issues will satisfy the users of intrusion detection systems and en- 
courage the researchers in the area to continue improving their technology. 



October 2000 



Herve Debar 
S. Felix Wu 




Organization 



RAID 2000 is hosted by and gratefully acknowledges the support of ONERA 
Centre de Toulouse. 



Conference Chairs 

Executive Committee Chair Marc Dacier (IBM Research, Switzerland) 
Program Co-Chairs Herve Debar (IBM Research, Switzerland) 

S. Felix Wu (North Carolina State University, USA) 
Publication Chair Ludovic Me (Supelec, France) 



Program Committee 



Matt Bishop 
Dick Brackney 
Rowena Chester 
Frederic Cuppens 
Marc Dacier 
Herve Debar 
Yves Deswarte 
Terry Escamilla 
Deborah Frincke 
Tim Grance 

Ming-Yuh Huang 
Erland Jonsson 
Sokratis Katsikas 
Baudouin Le Charlier 
Ludovic Me 
Abdelaziz Mounji 
Vern Paxson 
Jean-Jacques Quisquater 
Mark Schneider 
Steve Smaha 
Peter Sommer 

Stuart Staniford-Chen 
Peter Thorne 
S. Felix Wu 
Kevin Ziese 



University of California at Davis, USA 
National Security Agency, USA 
University of Tennessee, USA 
ONERA, France 
IBM Research, Switzerland 
IBM Research, Switzerland 
LAAS-CNRS and SRI-International, France 
IBM, USA 

University of Idaho, USA 

National Institute of Standards and 

Technology, USA 

The Boeing Company, USA 

Chalmers University of Technology, Sweden 

University of the Aegean, Greece 

Universite de Namur, Belgium 

Supelec, France 

Swift, Belgium 

ACIRI/LBNL, USA 

Universite Catholique de Louvain, Belgium 
National Security Agency, USA 
Free Agent, USA 

London School of Economics and Political 

Science, England 

Silicon Defense, USA 

University of Melbourne, Australia 

North Carolina State University, USA 

Cisco Systems, USA 




Preface 



VII 



Additional Referees 

Dominique Alessandri IBM Research, Switzerland 

Klaus Julisch IBM Research, Switzerland 

Andreas Wespi IBM Research, Switzerland 

Local Organization Committee 

Frederic Cuppens ONERA, France 

Claire Saurel ONERA, France 

Sponsoring Institutions 

Alcatel 

IBM 

Internet Security Systems 




Table of Contents 



Logging 

Better Logging through Formality 1 

Chapman Flack and Mikhail J. Atallah 

A Pattern Matching Based Filter for Audit Reduction and 

Fast Detection of Potential Intrusions 17 

Josue Kuri, Gonzalo Navarro, Ludovic Me and Laurent Heye 

Transaction-Based Pseudonyms in Audit Data 

for Privacy Respecting Intrusion Detection 28 

Joachim Biskup and Ulrich Flegel 

Data Mining 

A Data Mining and CIDF Based Approach for Detecting Novel and 

Distributed Intrusions 49 

Wenke Lee, Rahul A. Nimbalkar, Kam K. Yee, Sunil B. Patil, 

Pragneshkumar H. Desai, Thuan T. Tran and Salvatore J. Stolfo 

Using Finite Automata to Mine Execution Data for Intrusion Detection: 

A Preliminary Report 66 

Christoph Michael and Anup Chosh 

Modeling Process Behavior 

Adaptive, Model-Based Monitoring for Cyber Attack Detection 80 

Alfonso Valdes and Keith Skinner 

A Real-Time Intrusion Detection System Based 

on Learning Program Behavior 93 

Anup K. Chosh, Christoph Michael and Michael Schatz 

Intrusion Detection Using Variable-Length Audit Trail Patterns 110 

Andreas Wespi, Marc Dacier and Herve Debar 

Flexible Intrusion Detection Using Variable-Length Behavior Modeling 

in Distributed Environment: Application to CORBA Objects 130 

Zakia Marrakchi, Ludovic Me, Bernard Vivinis and Benjamin Morin 




X 



Table of Contents 



IDS Evaluation 



The 1998 Lincoln Laboratory IDS Evaluation (A Critique) 145 

John McHugh 

Analysis and Results of the 1999 DARPA Off-Line Intrusion 

Detection Evaluation 162 

Richard Lippmann, Joshua W. Haines, David J. Fried, 

Jonathan Korha and Kumar Das 

Using Rule-Based Activity Descriptions 

to Evaluate Intrusion-Detection Systems 183 

Dominique Alessandri 



Modeling 



LAMBDA : A Language to Model a Database for Detection of Attacks .... 197 
Frederic Cuppens and Rodolphe Ortalo 

Target Naming and Service Apoptosis 217 

James Riordan and Dominique Alessandri 

Author Index 227 




Better Logging through Formality 

Applying Formal Specification Techniques to Improve 
Audit Logs and Log Consumers 



Chapman Flack* and Mikhail J. Atallah** 
CERIAS, Purdue University 

1315 Recitation Bldg., West Lafayette, IN 47907-1315 USA 
{f lack,mja}@cerias .purdue . edu 



Abstract. We rely on programs that consume audit logs to do so suc- 
cessfully (a robustness issue) and form the correct interpretations of the 
input (a semantic issue). The vendor’s documentation of the log for- 
mat is an important part of the specification for any log consumer. As 
a specification, it is subject to improvement using formal specihcation 
techniques. This work presents a methodology for formalizing and refin- 
ing the description of an audit log to improve robustness and semantic 
accuracy of programs that use the log. Ideally applied during design of 
a new format, the methodology is also profitably applied to existing log 
formats. Its application to Solaris BSM (an existing, commercial format) 
demonstrated utility by detecting ambiguities or errors of several types 
in the documentation or implementation of BSM logging, and identify- 
ing opportunities to improve the content of the logs. The products of 
this work are the methodology itself for use in rehning other log formats 
and their consumers, and an annotated, machine-readable grammar for 
Solaris BSM that can be used by the community to quickly construct 
applications that consume BSM logs. 

Keywords: log, formal, specification, documentation, reliability, inter- 
operability, CIDF, BSM, grammar 



1 Introduction 

Audit logs maintained by computing systems can be used for a variety of pur- 
poses, such as to detect misuse, to document conformance to policy, and to 
understand and recover from software or hardware failures. Any such applica- 
tion presumes a log consumer, software that can read and analyze the log, and 

* Supported in part by an Intel Foundation Graduate Fellowship, by contracts 
MDA904-96-1-0116 and MDA904-97-6-0176 from Maryland Procurement Office, 
and by sponsors of the Center for Education and Research in Information Assur- 
ance and Security. 

** Supported in part by Grant EIA-9903545 from the National Science Foundation, 
and by sponsors of the Center for Education and Research in Information Assurance 
and Security. 
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draw conclusions of interest about the state history of the system that produced 
the log. 

Two requirements that apply to any log consumer are easily stated: the 
consumer should be able to read any sequence of the possible log records without 
failure, and any results computed (or conclusions drawn) should be correct (or 
justifiable). These requirements may not have the same weight in all applications. 
Some unreliability in a tool devised ad hoc to count uses of a certain software 
package, for example, may be tolerated as a practical matter. 

Also, it may suffice for an ad hoc tool to skim the log for a very small frac- 
tion of its information content, as the use to which the information will be put 
is known in advance. However, initiatives like the Common Intrusion Detection 
Framework (CIDF)[14] place a renewed emphasis on exchanging event informa- 
tion among multiple agents, where one agent should not discard information 
another may need. CIDF’s Common Intrusion Specification Language (CISL) is 
necessarily expressive enough to convey the semantic nuances of event records; 
that very expressiveness increases the pressure on any system that would trans- 
late logs into CISL not to miss or mistranslate those nuances, lest later analysis 
be led into error. In a production intrusion detection system, failures caused by 
incorrect handling of the input, or unsound conclusions resulting from semantic 
misunderstandings, may be costly. In unlucky cases, they may represent new, 
exploitable vulnerabilities introduced by the security tool itself. 

This paper describes a way to reduce the risk, observing simply but centrally 
that the usual documentation accompanying a system that produces logs is also 
a partial specification for software that must consume those logs. Software engi- 
neering techniques for formalizing specifications can therefore be applied to find 
and purge ambiguities and inconsistencies, and reshape the document into one 
from which reliable log consumers can more readily and consistently be built. 
Opportunities to improve the log itself may be revealed in the same process. 

The amount of attention devoted here to the mere task of reading a stream 
of data may be surprising to a reader who has not been immersed for some time 
in extracting meaning from audit logs of general purpose systems. The case that 
an audit log is a peculiarly complex stream of data, presenting subtle issues of 
interpretation, will be built in Sect. 4 with quantitative support in Sect. 6. 

2 Contributions 

Contributions of this work include artifacts of immediate use to the community, 
and suggestions with demonstrated potential to improve design of future audit 
producers and consumers. 

~ A grammar and lexical analyzer package for Sun’s Basic Security Module [8] 
(BSM) audit data through Solaris 2.6. The package, requiring Java[3] and 
the ANTLR parser generator)!!], produces a parser for BSM audit that can 
be rapidly extended with processing specific to an application. The parser is 
conservative: it may signal a syntax error on an undocumented BSM record 
that was not covered in our test data, but will not silently accept invalid 
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input. Undocumented rules are readily added to the grammar as they are 
discovered. The package, available for educational and research purposes, 
has been used to speed development in one completed and several ongoing 
BSM-related projects. 

— The BSM grammar in the package is extensively annotated, with hyper- 
links from grammar rules to the corresponding pages of Sun documentation. 
While others working with BSM have undoubtedly noted some of the same 
ambiguous or misdocumented records that we have, and probably some not 
present in our test data, there has not always been a document available to 
the community and intended to detail such discoveries in one place. 

— While grammars and parsing techniques arguably offer a natural approach to 
the reliable processing of structured information that auditing requires, they 
have been strangely often neglected in practice, as described in Sect. 4. Ques- 
tions of practicality may have discouraged more widespread adoption. For 
example, while it is clear that audit records must be described by some gram- 
mar,^ that observation does not alone guarantee a tractably parsed gram- 
mar. [4] Section 5.2 argues for more optimism, and this work demonstrates 
practicality and effectiveness on a widely-available, commercial audit format. 
At the same time, the complete grammar can be studied for useful insights 
such as which aspects of BSM logs are beyond the expressive power of, e.g. 
regular expressions. 

— The requirement to draw justifiable conclusions, mentioned in Sect. 1, reveals 
that not only syntax but semantics must be captured. Semantic content is 
an, or possibly the, important thing to get right. Providing a grammar does 
not lay the issue to rest, but does help in two ways. First, Sect. 4 argues that 
the grammatical structure of the input cannot be ignored without sacrific- 
ing semantic information. Second, in a more cognitive than technical vein, 
without a concrete representation of the details to be resolved, promising 
discussions about audit content sometimes end up, to recycle Pauli, “not 
even wrong.” 

3 Audience 

This paper assumes a familiarity with parsing concepts and some parser gen- 
erating tool, such as might be acquired in an undergraduate compilers course. 
Examples will be in the notation of the ANTLR parser generator, similar enough 
to other tools’ notations that the reader need not know ANTLR per se to follow 
the arguments, but may refer to [11] to pursue niceties that are tangential. 

Examples of BSM event record formats will be presented and discussed. For 
the most part, these are records of UNIX system calls and the discussion may 
assume a familiarity with the operations and subtleties of the programming 
interface for UNIX or a similar operating system, and some of the ways those 
operations can be abused, such as might be expected in the intrusion detection 

^ They are produced by a Turing-equivalent machine. 
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community. Terms such as ‘rules’ and ‘transition diagrams’ familiar from rule- 
based (e.g. [10]) and state-based (e.g. [5]) intrusion detection efforts will be used 
freely. Sample BSM records, being binary rather than text, would add little to 
the perspicuity of the grammar examples they match, and such low level details 
play no role in the discussions. No specific familiarity with BSM will be required 
to follow the arguments, though the reader whose curiosity is piqued may refer 
to [8]. 

4 Other Approaches 

It is not necessary to have a grammar to extract some useful information from 
an input stream. Various intrusion detection systems support BSM audit logs, 
and we obtained access to code or internals documentation to see how three 
of them do it.^ ASAX[10], IDIOT[2], and USTAT[5] all skim information from 
BSM logs without concern for grammatical structure. This section will compare 
their approaches and examine some consequences. The critique is not intended 
to disparage these projects, which set out to shed light on other aspects of the 
intrusion detection problem and did so with acknowledged success. The con- 
sequences discussion will include some issues that apply to a grammar-based 
approach as well, and so should be considered in any audit project. 

4.1 Canonical Form 

All three systems have some notion of a canonical form into which the native 
BSM log is transformed as a preprocessing step. Their canonical forms are rather 
different in intent and reality from CISL. Where CISL sets out to allow hetero- 
geneous applications across platforms to share event and analysis data and agree 
on their interpretation, the canonical forms of ASAX, IDIOT, and USTAT serve 
mostly to simplify porting of the tools themselves. 

4.2 How the Log Is Processed 

ASAX. In ASAX, most of the work of converting BSM to the “normalized audit 
data format” (NADF) is done in conv_bsm. c. Examination reveals a table, each 
row of which contains a BSM token ID, a base NADF field ID, and a pointer. 
There is a row for every BSM token type expected to occur; for tokens that 
can appear multiply in one event record (arg, for example), several rows are 
allocated. Before each BSM record is processed, the pointer is nulled in each 
row. Then, for each token in the record, if a row can be found that contains the 
token’s ID and a null pointer, the first such row is updated to point to the token. 

After the whole record has been read, the rows for token types that did not 
appear in the record still have null pointers, while rows for token types that 

^ We suspect the way these tools process the BSM input is typical, but there are 
certainly other tools that support BSM for which we did not obtain source code or 
sufficiently low-level documentation to make a determination. 
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appeared point to the corresponding tokens. Finally, in the order of appearance 
in the table, tokens are copied field-by-field to the NADF record. The final 
NADF record contains verbatim copies of the fields from the BSM record, but 
with padding to support aligned access, and reordered to match the order of 
token IDs in the table. 



IDIOT, idiot’s canonical form is slightly more abstracted from the underlying 
audit format than is NADF. IDIOT defines a set of attribute names and picks 
out field values of interest from BSM tokens as they are encountered, binding 
them to the corresponding IDIOT attributes. The mapping is done by a script 
that reads the line-oriented output of pr audit, a Sun tool that renders the BSM 
data in readable text, one BSM token per line. 

USTAT. USTAT also defines its own abstract event record, whose attributes 
(with a few exceptions) are bound to fields of BSM tokens as shown in Fig. 4.5 
of [5], provided those tokens appear in the incoming event record. 

4.3 Consequences 

Invalid Input Detected Late or Not at All. A strategy of simply copying 
data from tokens as they appear, best seen in the AS AX source, will not detect 
ungrammatical input, such as an event header followed by an impossible se- 
quence of tokens for that event. An invalid stream of native data can be silently 
transformed to an invalid stream in canonical form, leaving the problem to be 
detected in a later processing step, if at all. 



Semantic Interpretation Left to Later Stages. The canonicalizers used 
by ASAX, IDIOT, and USTAT defer, to varying degrees, details of the native 
audit format to be handled by the later, ostensibly less platform-specific, stages 
of analysis. The situation is clearest in ASAX, whose canonical form is nothing 
more than the native form with fields reordered and padded for aligned access. 
The specific significance of information within the fields is left to be spelled out 
in RUSSEL, the language of ASAX rules. Dealing with native format issues in 
rules complicates porting those rules, even to other flavors of UNIX whose audit 
formats differ even though the system calls and their semantics are the same. 
Format idiosyncrasies of a given field or event must also be handled in each rule 
that involves the field or event, and a rule language may not have convenient 
facilities for functions or subroutines.^ 

IDIOT and USTAT take on more of the interpretation problem at the time 
of canonicalization, at least picking out certain fields of interest and mapping 
them into a set of attributes intended to be less platform specific. However, the 

® It is possible in ASAX to isolate some format-specific processing in functions written 
in a traditional programming language, linked to ASAX, and invoked in RUSSEL 
rules. 
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combination of such selective inclusion of fields with disregard for grammatical 
structure can lead to loss of semantic content, as described next. 

Lost Syntactic Cues to Meaning. Consider the BSM grammar rules in Fig. 1. 
To determine whether an ioctl system call was applied to a socket, an other 
non- file, a file descriptor with no cached name, or an ordinary good file descriptor 
requires careful consideration of what tokens appeared in the record. If a parser 
is used, subsequent analysis logic has only to look at the parse tree to determine 
which rule matched the input. Absent a parser, some analysis effort equivalent 
to deciding which rule would have matched is deferred to later processing stages. 
Again, in ASAX, the work can be done by RUSSEL rules that are tightly bound 
to the details of the native format, explicitly testing for the presence of certain 
fields or for distinctive field values. 



loctlGoodFileDescr 

: path (attr)? arg[2, "cmd"] arg[3,"arg"] ( arg[2, "strioctl : vnode"] )? 
loctlSocket 

: (socket)? arg[2,"cmd"] arg[3,"arg"] 
loctlNonFile 

: arg[l,"fd"] arg [2 , "cmd"] arg[3,"arg"] 
loctlNoName 

: arg[l,"no path: fd"] ( attr )? arg[2,"cmd"] arg[3,"arg"] 

( arg[2, "strioctl:vnode"] )* // have seen 0, 1, or 2 of these 



Fig. 1. ANTLR grammar rules for a portion of a system call record. The parser’s 
decision which rule to apply distills semantic content from a series of specific tests 
that would otherwise be left to later processing stages 

In IDIOT or USTAT the situation is complicated by the mapping of native 
fields into canonical attributes. In Fig. 1, if the optional attribute and vnode 
arguments are absent, determining which of the last two rules to apply hinges 
on the text string of the first arg token. Because the text in an arg token is a 
constant string serving only as a syntactic cue, it is not copied to the canonical 
form used by IDIOT or USTAT. As a result, the semantic content conveyed in 
a parse of the native format cannot be recovered at all in IDIOT or USTAT, an 
example of the semantic sacrifice risked when grammatical structure is ignored. 

Nonconservative Transformations. The problem of dangerous transforma- 
tions must be considered in any audit project, grammar-based or not. However, 
the systems described in this section offer examples to illustrate this important 



issue. 
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IDIOT relies on Sun’s praudit tool to preprocess the binary log format into 
a text representation. The transformations made by praudit go beyond repre- 
senting the binary log. User and group IDs, for example, are presented as user 
and group names, and the transformation reflects the name-to-ID mapping in 
effect at the time praudit runs, not at the time of the audited event. That trans- 
formation can be disabled by a praudit option, but others cannot. IP addresses, 
for example, are displayed as domain names by praudit, a transformation that 
reflects the state of the Domain Name System[9] at the time the tool is run, not 
when the event was logged.^ 

The USTAT document in Sect. 4. 1.2. 6 [5] describes another nonconserva- 
tive transformation. BSM records often present path names in a non-canonical 
form such as /etc/, ,/usr/share. USTAT’s preprocessor, accordingly, includes 
a “filename correcting routine,” the wisdom of which can be questioned on two 
grounds.^ First, such a transformation cannot be said to conserve correctness 
without knowing the state of the affected file systems, including symbolic links 
and any cycles in the directory graph reflecting accidental or deliberate file sys- 
tem corruption at the time the event was logged. Second, the exact form of the 
path name appearing in the log reflects the kernel’s construction of the path 
from a process root and working directory and any symbolic links encountered; 
it conveys part of how the event came to pass, and may have forensic value. 

Audit records are often consulted in cases where the integrity of the system 
that produced those records is in doubt. It seems prudent, in transformations 
applied to those records after the fact, to avoid unnecessary assumptions about 
the state of the system that produced them. The same concern need not apply to 
transformations reliant on mappings that are widely known and independent of 
any single computer system. Those transformations, such as IP protocol numbers 
to the names of those protocols, may arguably be used wherever they would be 
useful. 

4.4 Discussion 

The intrusion detection systems just described all translate native BSM audit 
logs into some canonical form without looking at grammatical structure. They 
do so, however, by defining canonical forms that offer little semantic support 
to later analysis stages. The proposal by Bishop[l] is another example of such 
a canonical form. Where such a form is used, the authors of rules or transition 
diagrams must still account for what is meant in the original native form when, 
for example, a certain held is absent from a record. The rules, therefore, become 
platform specific.® 

A more fundamental, equally fatal, but less enlightening nonconservative transfor- 
mation applied by praudit is the presentation of the log data in a delimited form 
with no escaped representation for occurrences of the delimiter in the data. This 
feature alone rules out any role for praudit in a reliable consumer of BSM logs. 

® This transformation is also done by praudit. 

® While this paper was in preparation, a portion of the EMERALD project [6], eXpert- 
BSM became available for review. Unfortunately, the distribution terms prohibit 
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Such lazy translation is not even an option if the target representation, like 
CISL, intends to convey the semantic nuances revealed by a careful parse of the 
original. A translator that overlooks or mistranslates those nuances will produce 
a false translation that may lead later analysis into error. 

Finally, canonical or intermediate representations of audit data should be 
scrutinized for assumptions that would require nonconservative transformations 
during conversion. An example would be a canonical form that identifies ma- 
chines by domain names, if IP numbers are used in the native form. 

5 Appropriateness of a Grammar Approach 

At least two objections may be considered to a grammar representation of a 
logging format. 

5.1 Efficiency 

Logs are voluminous and efficiency in their processing is important. Developers 
observing that constraint may lean toward ad hoc and handcrafted techniques 
and away from strict attention to grammatical structure. Section 4, however, 
showed that the price of parsing, if saved up front, must be paid later if the full 
information content is to be extracted from the input. In fact the price is paid 
with interest, as a single test and decision not made on the initial parse of the 
data may have to be duplicated in many rules that apply to the same records. 
It was not in the scope of this work to build otherwise-comparable intrusion 
detection systems and obtain a performance comparison, but these observa- 
tions, coupled with the importance of reliability and maintainability, suggest 
that grammar techniques in audit processing should not be dismissed out of 
hand on efficiency grounds. 

5.2 Applicability 

The foregoing discussion breaks down unless it is reasonable to expect that audit 
logs can be described by grammars in the classes that enjoy efficient parsing 
algorithms. 

A distinction must first be made, just as in the specification of programming 
languages. A grammar like that given in the Java specification [3] does not pur- 
port to describe the language “all semantically reasonable Java programs”; it 
describes the simpler language, “syntactically valid Java programs.” The gram- 
mar, therefore, is a specification with a deliberately limited scope. Aspects of 
the language excluded from its scope fall into two broad categories: 

reverse-engineering to discern just how the log is processed, but [7] presents some 
sample detection rules for this newer tool and here again, rules that describe attacks 
applicable to UNIX systems generally must be written to the specifics of the BSM 
format. 
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Unspecified Aspects. Some aspects of a language are not addressed by any 
part of the specification. For example, the Java grammar imposes no structure 
on a methodBody or other block, other than that it be some sequence of zero 
or more blockStatements. The statements themselves, and their subproductions, 
are explicitly specified, but it is considered beyond the scope of a language 
specification to characterize how those statements might be placed in meaningful 
blocks by programmers. 



Aspects Specified Extragrammatically. Some details of the language are 
explicitly specified elsewhere. For example, Java’s official grammar is Chapter 
19 of the Java specification. Other chapters, in prose or grammar-like notation, 
contain requirements not embodied in the grammar itself, such as those for 
casts and parenthesized expressions, or field and method modifiers. Therefore, 
the grammar describes a superset of conforming programs, which must be culled 
after parsing by enforcing the extragrammatical requirements. 

For Java, two factors contributed to the exclusion of these details from the 
grammar itself: the choice to provide a grammar no more complex than can be 
parsed left to right without backtracking and with only one token of lookahead, 
and the choice to adopt a C-like syntax, which includes constructs that cannot 
be parsed that way. 



Minimizing Extragrammatical Requirements. The need for the second 
kind of scope restriction can be reduced by relaxing restrictions on the gram- 
mar to be provided. For example, ANTLR supports LL(fc) grammars for config- 
urable k with predicates (a form of localized backtracking) [12], and comes with 
a fe = 2 Java grammar that explicitly embodies requirements for casts, etc., that 
had to be left out of the official LALR(l) Java grammar. 

If not constrained to perpetuate difficult features of an existing language, a 
designer can so craft a new language that a simple, efficiently parsed class of 
grammar is adequate to specify it, and few or none of its syntactic features need 
to be specified extragrammatically. The designer of a new audit logging format 
is in such a position. 



Application to Audit Logs. The specification for an audit log, like that 
for a programming language, may be deliberately restricted in scope. While 
each individual event, and its subproductions, should be explicitly specified, the 
sequences in which events may appear in actual use of the system depend on 
user and program behavior, and their easy characterization a priori is unlikely. 
As with a block of “zero or more statements,” the simple “zero or more events of 
any type” is a permissive superset of the expected event sequences and presents 
no difficulty in parsing. 

The individual event records are produced by code that must execute when, 
and only when, the corresponding events take place. Necessary restrictions on 
the logging code (e.g. termination guarantees) limit its complexity and, with 
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it, the complexity of the grammar required to describe the record, even if the 
format was not designed with a specific grammar class in mind. The commercial 
log format described in the next section was successfully described in ANTLR 
notation with 1-token lookahead for most choices. The predicates required at 
other choice points all amount to constant-depth additional local lookahead. 

6 Formalizing the BSM Audit Format 

This work began when a robust consumer for Sun’s Basic Security Module (BSM) 
audit log format[8] was needed for another project. The existing documentation 
on the format was transcribed into a grammar notation. Before a parser could 
be generated to test the grammar against actual logs, it was necessary to modify 
the grammar to resolve all ambiguities detected by the parser generator. The 
grammar was then iteratively refined by generating a parser, running it on actual 
logs, and observing parse errors. A parse error could represent an error in the 
BSM documentation, or a fault in the Solaris log production code. It could be 
resolved for the next iteration by modifying either the grammar or the Solaris 
code. For this project, modifying the code was not an option, so all parse errors 
were resolved by modifying the grammar, leading ultimately to a grammar that 
describes closely the log that Solaris actually produces, even in instances that 
seem unintended. 

The resulting grammar contains 327 named, nonterminal rules. Examination 
shows that the rules are associated in a straightforward manner with the 267 
kernel and user event types and 41 token types found in the Solaris 2.6 system 
files, and follow the Sun documentation with only necessary departures. That 
is, the number of rules does not reflect an especially obfuscated grammar but, 
rather, an indication of the intrinsic complexity of the audit log alluded to in 
Sect. 1. By comparison, the example grammar supplied with ANTLR for the 
Java 1.1 programming language includes 64 such rules. 

The remainder of this section will discuss selected examples of the flaws 
or ambiguities in BSM documentation or implementation that were detected 
by this methodology. The entire grammar, with annotations describing discrep- 
ancies, can be downloaded from http://www.cerias.purdue.edu/software/ 
with the other files needed to compile and run a working BSM parser. To print 
the grammar as an appendix would be impractical because of its size, and would 
sacrifice the hyperlinks that connect the grammar rules to the corresponding 
sections of Sun’s BSM documentation. 

6.1 Difficulties Detected by Static Analysis in Parser Generator 

Non-LL(l) Constructs. Many of the ambiguities that were automatically de- 
tected simply reflected features of the BSM log format that cannot be recognized 
by an LL parser with one lookahead token; they were resolved by adding explicit 
lookahead at strategic places in the grammar. They do not reflect inherent ambi- 
guity in the log format, but nevertheless are possible pitfalls for developers who 



Better Logging through Formality 



11 



attempt to develop a straightforward BSM consumer tool from the documenta- 
tion without a parser generator’s rigorous analysis. 



Constructs Resolvable with Semantic Information. Compiling a naive 
version of our BSM grammar will result in 16 warnings of ambiguity apparently 
inherent in the log syntax, any one of which would suffice to dash the hope 
of reliably processing BSM logs, whether by a conventional parser or by any 
other means. Although it is impossible in these cases to determine the correct 
grammar rule to apply from the sequence of BSM token types alone, they can be 
resolved by looking explicitly into the values carried by certain of those tokens, 
an operation known in ANTLR terms as a “semantic predicate.” Specifically, a 
BSM ‘arg’ token contains a data field whose value is necessary and sufficient to 
resolve these 16 cases. The data values, which are constant character strings, are 
shown in the printed documentation, albeit without an explanation that they are 
essential at parse time to properly interpret the log. Both IDIOT and USTAT 
appear to discard these values in the conversion to canonical form, perhaps on the 
assumption that a token field whose value is constant does not convey essential 
information. 



True Ambiguity. It may not be surprising, given the complexity of what BSM 
must log and the lack of formal analysis in its original design, that a few ambi- 
guities remain. Instead of reflecting limitations of a particular parsing technique 
they are, if the BSM documentation is correct, inherent in the log format. Fig- 
ure 2 is an example. 

The description with two optional ‘text’ tokens followed by two mandatory 
ones leads to a formal ambiguity if an event record has exactly three text tokens 
following the header. It is clear in that case that one of the two optional text 
tokens is present, but the parser cannot determine whether it is the driver major 
number or the driver name. The ambiguity cannot be resolved, even with a 
semantic predicate, unless there is a way to tell decisively by looking at the text 
string whether it is a driver major number or a driver name. Perhaps the number 
is always a text string of only digits and the name must begin with a non-digit, 
but this should be stated in the BSM documentation if programs are expected to 
depend on it. Or, it may be that the documentation is mistaken in showing the 
number and name as being independently optional: perhaps it should be “[text 
text] text text” with the first two both there or both absent. If that is the case, 
the documentation should be corrected. 

Without access to the intent of the BSM developers, the grammar was mod- 
ified to embody the last interpretation, which is reasonable and conservative 
under the circumstances. It will work if the first two ‘text’ tokens are both 
present and if they are both absent. If an instance is encountered of the ambigu- 
ous case with one of the two present, a parse exception will be signaled, avoiding 
an undetected misinterpretation. 
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6.2 DifRculties Detected in Testing 

After the statically-detectable problems were resolved, the grammar was repeat- 
edly used to generate a parser. The parser was applied to a collection of 2.2 
megabytes of BSM audit data obtained in-house and from other institutions, 
from SunOS and Solaris systems as recent as Solaris 2.6. Two general classes of 
discrepancy were detected between the BSM documents and the actual logs. 

Undocumented Records. Some records were encountered in the sample logs 
that simply do not appear in the documentation. Corresponding rules were added 
to the grammar to allow the logs to be parsed. Fig. 3 is an example. 



AUE_CONNECT 

: ’/,AUE_CONNECT socket socket subj ret 



Fig. 3. ANTLR grammar rule for an event record that appears in our sample 
logs but is not documented 

Misdocumented or Misimplemented Records. Some records were encoun- 
tered for event types that were documented, but parse errors were detected 
because the records did not conform to the published format. Fig. 4 is an exam- 
ple. So that the logs could be parsed, the affected grammar rules were changed 
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from direct transcriptions of the documentation to reflect the records actually 
encountered. 



Event Name 


Program 


Event ID 


Event Class 


Mask 


AUE su 


/usr/bin/ su 


6159 


lo 


0x00001000 


Format: 
header- token 
text-token 
subject -token 
return- token 


(error message) 









Fig. 4. Description of a record from [8] . In actual audit logs examined in this 
work, the text and subject tokens appear in the reverse order 



6.3 Difficulties Not Automatically Detected 

Figure 5 illustrates a point where the published BSM documentation is incom- 
plete, and hence the interpretation of a log record is not completely determined, 
but the formal method described in this work could not detect the problem. 
The problem was recognized, however, during the process of transcribing the 
documentation into a grammar, and the discipline of that process may have 
contributed to that recognition. 

Two tokens are shown as optional: the file attributes for the source file, and 
the rename destination path. The rule is quite readily parsable, but has two sus- 
picious features. First, the optional tokens are shown as independently optional, 
implying four possible record variants for the rename event. In actual logs, only 
two — both tokens present, both absent — have been observed. The documenta- 
tion may be incorrect, but the methodology will not detect the problem. The rule 
as stated presents no parsing difficulty that would be detected in static analysis, 
and, if incorrect, it matches a superset of the records that can be encountered, so 
no parse error will be produced. Nevertheless, it should spur any conscientious 
developer of a log consumer to wonder exactly what should be inferred about the 
state of the audited system when each of the — as written — four variant forms is 
encountered. 

The second suspicious feature is that the destination path is shown as op- 
tional at all. It is absent in our samples only when the file attributes are absent 
also, which seems to happen only when the source file is not found. The feature 
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rename (2 ) 




may be an artifact of some implementation detail within the rename system 
call. It might be worth changing, however. An intrusion detection system might 
recognize a certain intrusion attempt from a rename with a specific destination. 
Detection could be delayed if the intruder mistypes the source file name the first 
time, causing the recognizable destination path to be omitted from the record. 

7 Methodological Recommendations 

The ideal time to apply the ideas of this work would be during the design of the 
audit log format for a new system. A new log format can be designed to fall in 
a language class that is easily parsed with modest lookahead, and specification 
ambiguities detected by static analysis can be eliminated before implementa- 
tion. Specification-driven tools can speed implementation and testing, and the 
annotated grammar can be provided as documentation. 

The ideas can still be applied, however, when an existing log format is re- 
viewed for possible improvement, and even in the simple development of tools 
to consume an existing format. In this less ideal setting, too late for the other 
formal-method benefits cited above, the technique has valuable potential for 
improved understanding of the log nuances and more thorough validation and 
verification of the software. It was applied in that way to BSM in this work, 
suggesting a methodology for similar projects. An existing audit format can be 
approached by iterating these four steps: 

1. Prepare a grammar by transcription from whatever documents are available. 
As ambiguities are detected by the parser-generator’s analysis, return to the 
documents, sample logs, experimentation, or system source code (if available) 
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to determine if any information present in the log can be used in explicit 
predicates to resolve the ambiguities. Also make note of constructs whose 
semantic significance is unclear to the human reviewer, even if not formally 
ambiguous. 

2. When the grammar can be successfully compiled, apply the parser to a good 
sample of audit data and note any parsing diagnostics. Determine whether 
these represent flaws in the log documentation, the logging implementation, 
the grammar, or combinations of these. If this process is undertaken by a 
vendor, whatever needs to be corrected can be. Otherwise, options may be 
limited to suggesting fixes or documenting the issue and complicating the 
grammar. 

3. Given a grammar that successfully describes the logs, scrutinize it for the 
semantic nuances of the rules. Choice points in the grammar always have 
semantic significance: because log records are produced by an automaton, 
the production of one of several forms of a record depends on and conveys 
information about the state of the system. An event record whose grammar 
rule shows three optional fields, for example, can make eight distinguish- 
able statements about a particular event and the system state in which it 
occurred, beyond what is conveyed by field values. If the eight semantic nu- 
ances are not clear, return to documents, experiments, or source code until 
a satisfactory account of them can be made, or until the grammar rule can 
be tightened to imply fewer cases. 

4. Update grammar, documentation, or code as necessary and possible, and 
repeat. 

8 Future Work 

— BSM and other auditing systems can have configuration options that control 
the inclusion or omission of certain optional fields in some records. Our 
grammar was tested on audit logs produced on systems with similar settings 
for those options. A single grammar could rapidly grow unwieldy if extended 
to accept the logs produced under all settings of the configuration options. 
Environment grammars[13] address the problem of parsing such classes of 
similar languages as efficiently as context-free languages, and could offer a 
cleaner solution. 

As it happens, the difference between an environment-grammar parser and 
an ANTLR parser resides entirely in the analysis algorithms used during 
parser generation. The structures and features required at run time by a 
parser specified by an environment grammar are exactly those of a parser 
generated by ANTLR. 

— The modest cost of careful parsing might be further discounted in a self- 
contained application where the exact information needed from the log is 
known in advance. For example, a self-contained intrusion detection system 
might compile its rule base together with the full log grammar, producing a 
parser that skims lazily where possible. 
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Abstract. We present a pattern matching approach to the problem of 
misuse detection in a computer system, which is formalized as the prob- 
lem of multiple approximate pattern matching. This permits very fast 
searching of potential attacks. We study the probability of matching of 
the model and its relation to the filtering efficiency of potential attacks 
within large audit trails. Experimental results show that in a worst case, 
up to 85 % of an audit trail may be filtered out when searching a set of 
attacks without probability of false negatives. Moreover, by filtering 98 
% of the audit trail, up to 50 % of the attacks may be detected. 



1 Introduction 

Research in intrusion detection has emerged in recent years as a major subject 
in the computer security field because of the difficulty of ensuring that informa- 
tion systems are free from security flaws. Computer systems suffer from security 
vulnerabilities regardless of their purpose, manufacturer or origin. It is both tech- 
nically hard and economically costly to ensure that systems are not susceptible 
to attacks. 

Two approaches have been proposed to address the problem: anomaly de- 
tection (see for example [1,2]) and misuse detection (see for example [3]). The 
former suggests that user’s activity in the system can be characterized so that a 
profile of “normal utilization” of the system is established and excursions from 
this profile are flagged as potential intrusions, or attacks in a more general sense. 
The latter assumes that attacks are well-known sequences of actions, called sce- 
narios or attack signatures, and that the activity of the system (in the form of 
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logs, network traffic, etc.) may be audited in order to determine the presence of 
such scenarios in the system. 

Anomaly detection leads to some difficulties: a flow of alarms is generated 
in the case of a noticeable systems environment modification and a user can 
slowly change his behavior in order to cheat the IDS. On the other hand, mis- 
use detection becomes an increasingly demanding task in terms of semantics 
and processing, as more sophisticated attacks are discovered every day (which 
implies an increasing number of sophisticated scenarios to search for in audit 
trails). These challenges have lead to a research trend aimed to a simplified rep- 
resentation of the problem in order to improve performance and efficiency of 
detection. In the short term, effective intrusion detection systems will incorpo- 
rate a number of techniques rather than a “one-strategy-flts-all” approach. The 
greater the variety of available tools is, the better the IDS is. 

In this spirit, we introduce an original intrusion detection model inspired by 
the misuse detection approach. Its main goal is to provide an intrusion detection 
system for fast detection of potential attacks rather that accurate (i.e., exhaus- 
tive) detection of actual attacks. The results of such a detection (i.e.. Altered 
audit trails, in which attacks may be present) would be used in turn as input for 
a more accurate detection algorithm. This idea was already at the root of the 
G'^sSAxA IDS, which use a genetic algorithm with this aim in view [4]. 

We formalize a concrete instance of the misuse detection problem as a pattern 
matching problem which permits very fast searching of potential attacks. We 
then study the statistics of this model and their relation to Altering efficiency of 
potential attacks in the resulting system. 

Section 2 explains our proposed intrusion detection model and the constraints 
of the problem. Section 3 gives analytical and experimental results on the prob- 
ability of matching. Section 4 presents our testing system and experimental 
results. Finally, conclusions and future works are presented. 



2 Intrusion Detection as a Pattern Matching Problem 

In general terms, the misuse detection problem is to detect the existence of a 
priori known series of events within the traces of activity of a system to protect. 

Traces widely differ in their origin, form and content, depending on the type 
of potential attacks that they attempt to cover. For example, traces in the form 
of network traffic collected by a firewall or a sniffer may be used to detect well- 
known attacks to implementations of a TCP/IP protocol stack. Another example 
are the logs of commands typed by users of a multi-user computer. In both 
cases, traces may be collected at a single place (e.g., an ethernet segment, a host 
computer) or at multiple locations simultaneously. We consider the detection of 
attacks using logs (audit trails) of commands typed by users of a distributed 
computer system. In this context, attacks appear to be typically short sequences 
of no more than 8 commands. 

We propose to model the misuse detection problem as a pattern matching 
problem in the following way: auditable commands in the system can be seen 
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chmod a 
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mknod ^ c 
xterm d 
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mv >> e 

cp f 

touch g attack 1: cdeb 

mail h attack 2: faghi 
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Multiple 
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Pattern Matching 
Algorithm 




Reported 

matches 



[ cdeb ] 


[ 65 ] 


[ cdeb ] 


[ 163 ] 


[ faghi ] 


[ 32 ] 


[ cdeb ] 


[ 115 ] 


[ faghi ] 


[ 99 ] 



Fig. 1. Attack signature searching as a multiple approximate pattern matching 
problem 



as letters of an alphabet U and the audit trail as a large string of letters in E 
(i.e., the text). The sequences of events representing attacks to be detected are 
then substrings (i.e., patterns) to be located in the main string. Since attackers 
may introduce spurious commands among those that represent an actual attack 
in order to disperse their evidence, a limited number of spurious letters must 
be allowed when searching the pattern. We are interested in simultaneously 
searching a set of patterns. Thus, the misuse detection problem can be regarded 
as a particular case of the multiple approximate pattern matching problem, where 
insertion in the pattern is the only allowed edit operation. Figure 1 illustrates our 
proposed model to map the misuse detection problem as a multiple approximate 
pattern matching problem. The set of commands in the audit trail and attack 
signatures is translated into letters of E. The resulting string and patterns are 
passed to a multiple approximate pattern matching algorithm which in turn 
searches for the occurrences of substrings in the main string. Matches represent 
potential attacks in the audit trail. 

We formalize the above problem as follows: Our text, Ti. is a sequence 
of n characters from an alphabet E of size a. Our patterns, . . . P^, are (short) 
sequences of characters from E. Let us consider such a pattern Pi,,m of length m. 
We want to report all the text positions that match P, where at most k insertions 
between characters of P are allowed in its occurrence in T. We call a = k/m the 
“error level” . 
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Our problem can be modeled using the concept of insertion distance. The 
insertion distance from a to b, denoted id{a, h), is the number of insertions neces- 
sary to convert a into b. We say that id(a, 6) = oo if this is not possible. Clearly, 
id(a, b) = |6| — |a| if a is a subsequence of b, and oo otherwise. 

We search for the pattern P in a text T allowing insertions. At each text 
position j G l..n we are interested in the minimum number of insertions needed 
to convert P into some suffix of Ti, j. This is defined as 



The search problem can therefore be formalized as follows: given . . . P^, T 
and fc, report all text positions j such that lid{P'‘, Ti,j) < k for some i G 1 . . .r. 

In [5], two different multipattern search algorithms specifically tailored 
for this pattern matching problem are presented. They are based on “bit- 
parallelism”, a technique to represent the state of the search using the bits of 
a computer word of w bits (typically w = 32 or 64). The basic algorithm takes 
0{nm log(fc) / w) time to scan the text for one pattern. A first multipattern search 
algorithm is 0(nr(l -|- a)^“''“/((Ta“)), which is better than r applications of the 
basic algorithm whenever a < (cr/e) — 1. A second multipattern search algorithm 
takes 0{nr\og(m + k)/w) time and is useful for a < (a jm) — 1. 

It is shown in [-5] that the algorithms can achieve impressive scanning speeds 
in practice. For example, they show the case of 4-letters patterns searched allow- 
ing 4 insertions, which is a case of interest in intrusion detection applications. On 
a Sun Enterprise 450 server, these algorithms allow searching for 100 patterns 
at a rate of 4 megabytes per second. 

The focus of this paper lies in the probabilistic model for the occurrences of 
patterns in text when insertions are allowed, and its relation to the problem of 
false detection of nonexistent attacks (i.e., “false positives”) and misdetection 
of existing attacks (i.e., “false negatives”). By characterizing this relation, the 
optimal filtering efficiency of the model may be determined. 

3 Probability of Matching 

We start by giving an upper bound on the matching probability of a random 
pattern of length m at a random text position, with up to k insertions. Consider 
a random text position j. The pattern P appears with k insertions at a text 
position ending at j if and only if the text window Tj-rn-k+i..j contains the m 
pattern letters in order. The window positions that match the pattern letters 
can be chosen in ways. Those letters are fixed but the other k can take 

any value. Therefore the probability that the text window matches the pattern 
with k insertions is at most 



where we are overestimating because not all the selections of window positions 
give different windows. For instance the pattern "abed" matches in text window 





( 1 ) 
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"abccd" with fc = 1 in two ways, but only one text window should be counted. 
In particular, our overestimation includes the case oi k' < k insertions, which is 
obtained by selecting the first k — k' characters of the text window as insertions 
and distributing the k' remaining insertions in the remaining text window of 
length m + k' . 

An asymptotic simplification (for large m and a = k/m considered constant) 
of the cost can be obtained using Stirling’s approximation to the factorial m! = 
(m /e)™-\/27rm(l + 0{l/m)): 



(1 + q)^+° \ 



(2) 



which, as a moves from zero, grows from l/tr™ to 1. To determine where the 
probability reaches 1, we require that cra“ < (1 + a)^+“, i.e., cr < (1 + a)(l + 
1 /a) A sufficient condition can be obtained by noticing that 1 < (l+l/a)“ < e, 
and therefore a > (cr/e) — 1 suffices. 

This means that a model based on insertions can be useful only if we keep k 
reasonably low, i.e., k < m{{ale) — 1). However, this is a pessimistic analytical 
model that needs experimental verification. 

We test experimentally the probability that a random pattern matches at a 
random text position. We generated a random text and 100 random patterns for 
each experimental value shown. Figure 2 (left) shows the probability of matching 
in a text of 3 Mb for a pattern with m = 300, where pattern and text were 
randomly generated over an alphabet of size tr = 68. The reason to choose such 
a long pattern is given shortly. 

As can be seen, there is a fc value from where the matching probability starts 
to grow abruptly, moving from almost 0 to almost 1 in a short range of values. 
This phenomenon is sharp enough to make this k value the most important 
parameter governing the behavior of the algorithm. We call k* this point, and 
a* = k* /m the corresponding error level. 

On the right part of figure 2 we have shown this limiting a* value for different 
pattern lengths, showing that a* tends to a constant for large m, despite that it 
is smaller for short patterns. The fact that a* tends to a constant limit when m 
grows motivated us to use m = 300 to show the process at a stable point. On the 
other hand, it must be noted that the limit is much lower for short patterns than 
its asymptotic value, and therefore the exact combinatorial formula of Eq. (1) 
should be preferred, leaving Eq. (2) just as a conceptual tool to understand how 
the process behaves in general. 

Finally, we show in figure 3 how the alphabet size a affects the a* value. As 
can be seen, the curve looks as a straight line, where least squares estimation 
yields a* = (cr/1.0856) — 0.8878. Again, this corresponds to long patterns, while 
the real values for short patterns should be obtained from the exact formula. 

All this matches our analytical results in the sense that (a) there is a clear 
error level a* where the matching probability goes almost from 0 to 1; (b) this 
point does not depend on m asymptotically; and (c) it depends on tr linearly 
as predicted by the analysis (a* = (a/e) — 1) except because the e has been 
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Fig. 2. On the left, matching probability for increasing k values and fixed m = 
300. On the right, the a* limit as m grows 




a 

Fig. 3. The a* limit as a grows 



changed to about 1.09. Interestingly, this is similar to the result obtained for 
the k differences problem in [6,7] when relating their analytical predictions (a* = 
1 — with the experiments (a* = 1 — 1.09/-^^ and shows a consistent 

behavior of the pessimistic analytical model used in both cases. 

4 Experimental Results 

We experimentally studied how the probabilistic model of string matching al- 
lowing insertions relates to the problem of false negatives and positives. Our 
interest is to determine how a* relates to the ratio between false negatives and 
positives and the total number of reported attacks and, consequently, to the 
filtering efficiency of the model. 

The experimental input data consists of an audit trail and an attack database. 
Both of them are very simple. The audit trail was collected using the G^sSAq^A 



A Pattern Matching Based Filter 



23 



IDS in a real environment. The format of events given to G"^SA"pA is for the 
moment an extension of the one proposed in [8] : 

#S#version=suntrad5 . 6#system=S0LARIS#deamon=system#ahost=amstel#no=28# 
event=AUE_EXECVE#date=2000 . 3 . 14(314 . 29 . 41#program=/var/audit/ls# 
f ile=/var /audit /ls#euid=root#egid=other#ruid=r oot#rgid=other#pid= 13949# 
error=- l#r eturn=KO#E#I# 

#S#version=suntrad5 . 6#system=S0LARIS#deamon=system#ahost=lancelot#no=29# 
event=AUE_EXECVE#date=2000 .3.14014.29. 41#program=/usr/bin/ls# 
f ile=/usr/bin/ls#arg=ls , -als#euid=root#egid=other#ruid=root#rgid=other# 
pid=13949#error=0#return=0K#E#I# 

The attack database consists of attacks signatures with the following format: 

»> Attack_login 

rulel 

rulel 

rulel 

»> Attack_f ile_creation 
rule 2 

»> Attack_ps_cmd 

rule 3 

rule? 

Rules are defined in the following way: 

rulel ::= ( (event=AUE_login) I I (event=AUE_rlogin) ) && (return=K0) ; 
rule2 ::= ( event =AUE_CREAT) && ( (file co Is) I I (file co cd) ) ; 
rule3 ::= (event=AUE_EXECVE) && (program=/usr/bin/ps) ; 
rule4 ::= (event=AUE_EXECVE) && (program co crack) ; 
rules ::= (event=AUE_su) ; 

where the co operator stands for “contains”. 

The audit trail and attack signatures are translated into a pattern matching 
representation in three steps. First, a different letter is assigned to each rule (e.g., 
rulel = ’a’). Attack signatures are then translated into patterns by mapping 
their rules to the corresponding letters. Finally, the audit trail is scanned and its 
events are matched against the rules. Events which match more than one rule 
are assigned the corresponding letters. Events which do not match a rule are 
assigned arbitrary letters. The final string is constructed by concatenating the 
sequence of letters corresponding to matches of rules and the arbitrary letters. 

We used an audit file of 24,847 events and studied three different series of 
actions^: 

^ These are not really attacks, but it makes no difference from the algorithmic point 
of view. 
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Table 1. Main parameters for the three search patterns 



Attack 


m 


Occs. 


Prob. letter 


Nec. k 


Max. k 


Fract. of text 


Chained who 


4 


4 


0.004382 


225 


500 


8.21% 


Sensitive commands 


10 


2 


0.007187 


580 


620 


14.50% 


Chained whois 


4 


1 


0.001402 


1425 


1570 


5.74% 



Chained who: represented as a pattern of four events of a "who" command. 
The probability of the corresponding letter in our audit file is 0.004382 and 
there are four real attacks of this kind in the audit file. 

Sensitive commands: represented as a pattern of ten events of any command 
in the set { "last", "ps", "who", "whois" }. The probability of the corre- 
sponding letter in our audit file is 0.007187 and there are two real attacks of 
this kind in the audit file. 

Chained whois: represented as a pattern of four events of a "whois" command. 
The probability of the corresponding letter in our audit file is 0.001402 and 
there is one real attack of this kind in the audit file. 

We have searched the three patterns in our audit file allowing an increasing 
number of insertions k. Our goal is to determine the effectiveness of the proposed 
filtering algorithm^. That is: how much text is it able to filter out in order to 
retrieve what fraction of the real attacks that occur in the audit file? 

By applying the analytical predictions of Section 3 to our real data, we com- 
puted the maximum k value for which the matching probability does not reach 
1 (recall that the model is pessimistic). To compute that maximum value, we 
have used the most precise formula (Eq. (1)) for the matching probability. Given 
that the text is biased we have replaced 1 / tr™ by p™ , where p is the relative fre- 
quency of the letter that forms the pattern (all the attacks are repetitions of a 
single letter, otherwise we can just multiply the probabilities of the participating 
letters). 

Together with the maximum k recommended by the model we have com- 
puted the fraction of the text that the filter selects (for that k) as a candidate 
for further evaluation. This is simply the m+k characters preceding every match, 
avoiding to count multiple times the overlapping areas. 

Table 1 shows that using the maximum k recommended by the model selects 
just 6% to 15% of the text to be processed by a more costly algorithm. Moreover, 
we show in the column of “necessary fc” the minimum k value that is necessary 
to detect all the attacks present in the audit file. This turns out to be below (and 

^ The text that our filter is not able to discard has to be processed by a more so- 
phisticated algorithm in order to determine the presence of a real attack. As those 
algorithms are much slower than our pattern matching based approach, the effec- 
tiveness of the filter is crucial. 
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fract. of attacks detected 



Fig. 4. Fraction of attacks detected versus fraction of text left for further pro- 
cessing 



generally close to) the maximum k recommended by the model. Therefore, the 
model can be used to obtain a good estimator of the k value to use in order to 
detect all the real attacks. Of course, it is also possible to use specific knowledge 
of the application to determine the appropriate k. 

Regarding the false negatives, we evaluated, for our three particular patterns, 
the fraction of text filtered as a function of the fraction of attacks detected (see 
figure 4). As can be seen, the curve is concave, which suggests that considering 
a very small fraction of the text permits to detect most of the attacks. For ex- 
ample, with a k value that leaves just 2% of the text for further evaluation we 
get 50% of the attacks (and thus 50% of false negatives). We have here a way to 
balance the false negatives rate and the speed of detection. Of course, in many 
cases, no false negative is required. In that situation, the value of k determined 
by the model is an upper bound of the value to be used for the corresponding 
pattern. 

Regarding the false positives, we studied the evolution of the number of 
matches as a function of k for the three patterns (se figure 5). Of course, for 
some patterns, using a too large k value leads to many false positives. Let’s 
note that these false positives may be discarded by the more accurate detection 
algorithm which may analyse the output of our pattern matching mechanism 
(recall that we give a part of the trail containing a potential attack). To limit 
false positives without allowing false negatives, the value of k determined by the 
model appears as a near-optimum. 
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5 Conclusions and Future Work 

We have addressed a performance problem in intrusion detection. The problem is 
that the algorithms that accurately detect attacks in audit trails are complex and 
slow, and therefore can not cope with huge amounts of data that are generated 
when a system or a network is monitored. 

We have presented a pattern matching based filter for intrusion detection. 
The idea is that, despite that it may be difficult and slow to determine that an 
attack has occurred, it is possible to quickly determine that there is no attack 
in large portions of the audit trail. 

The pattern matching model is based on the concept of insertion distance, 
where the attack is seen as a sequence of letters (events) and the algorithms 
detect the text portions where all the events of the attack appear in order within 
a window of k other events. Recent pattern matching algorithms [5] specialized 
for this problem are able to spot the suspicious areas of the audit trail by scanning 
millions of events per second. In this way, the pattern matching algorithm quickly 
filters out a large portion of the text, leaving the rest to be examined by a more 
sophisticated (and slower) algorithm. 

We have presented an analytical model and preliminary experimental results 
about the fitness of the insertion distance model to detect attacks in a real 
application. We have shown that the k value predicted by the model is large 
enough to detect virtually all the relevant attacks, yet it still filters out most of 
the text (85% to 94%). This means that only 6% to 15% of the original text needs 
to be analyzed by a more sophisticated algorithm. Moreover, we have shown that 
most of the attacks are indeed found with a much smaller k value, so that speed 
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can be traded for precision. For example, leaving just 2% of the text to examine 
we were able to detect 50% of the attacks. An optimal value for k can be found 
which minimizes false negatives and false positives for groups of patterns with 
specific characteristics. 

Some work to undergo to improve the proposed approach follows. 

Further experiments with larger and more realistic data sets must be carried 
out in order to provide more accurate estimations of the filtering efficiency. 

The algorithm is assumed to take a random text with uniform distribution 
as input, which is not the case of our converted audit trails. The study of the 
implications of that fact is to be done. 

Our experiments were conducted off-line. We now need to conduct some on- 
line experiments. In that context, the efficiency of the mapping process will have 
to be studied with care. 

To conclude, the proposed approach was used for misuse detection. It could 
be also used for anomaly detection. The algorithm may then be used to verify 
that a process behaves as during a training period, as proposed by [9]. 
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Abstract. Privacy and surveillance by intrusion detection are poten- 
tially conflicting organizational and legal requirements. In order to sup- 
port a balanced solution, audit data is inspected for personal data and 
identifiers referring to real persons are substituted by transaction-based 
pseudonyms. These pseudonyms are constructed as shares for a suitably 
adapted version of Shamir’s cryptographic approach to secret sharing. 
Under sufficient suspicion, expressed as a threshold on shares, audit an- 
alyzers can perform reidentification. 

Keywords: privacy, anonymity, pseudonymity, audit analysis, intrusion 
detection, secret sharing, purpose binding 



1 Introduction 

Recent trends in computing and communication demand for two potentially con- 
flicting requirements, namely surveillance and privacy. Surveillance is necessary 
in order to guarantee secure services to the parties involved, even in the pres- 
ence of attacks against suboptimal security mechanisms. It is based on audit 
data about the activities of entities within a computing system. Since parts of 
such data can be associated with real persons, audit data contain personal data. 

Privacy, or more specifically informational self-determination, confines the 
processing of personal data, as regulated by the pertinent legislation. Among its 
basic principles, we And the need of either the data subject’s informed consent 
on or the legal or contractual necessity of processing the personal data and, ac- 
cordingly, a strict purpose binding for collecting, processing and communicating 
personal data. Pseudonymization, hiding the association between the data and 
real persons, can help to avoid confinements, as far as the original goals can be 
still achieved. 

Obviously, there are conflicts and tradeoffs among surveillance and privacy, 
and among audit analysis and pseudonymity, which have to be appropriately 
solved in a balanced way, both on the organizational level and by the technical 

* The work described here is currently partially funded by Deutsche Forschungsge- 
meinschaft under contract number Bi 311/10-1. 
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mechanisms. In fact, there are even acts dealing with aspects of surveillance and 
privacy in the same context. 

In this paper we provide an in-depth study of the problem to employ 
pseudonyms in audit data, as already sketched above. And we propose a widely 
applicable solution to be used in the framework of intrusion detection systems 
(IDS). Basically, our solution comprises the following features: 

— Audit services are supposed to be under the primary control of the personal 
data protection official (PPO) who is trusted by the user, on behalf of which 
the event generating component is operating. 

— The audit services, either by themselves or by appropriate extensions, for- 
ward sensitive audit data to remote analyzers only if that data is required 
to detect harmful behavior of the involved entities. 

— Pseudonymizers are implanted just behind the standard audit services, or 
their extensions, respectively, and are still under the primary control of the 
same PPO, though the analyzers have to assign (or to enforce) some re- 
stricted trust in them. 

— The pseudonymizer inspect standard audit data, and they substitute iden- 
tifiers by carefully tailored pseudonyms. Exploiting Shamir’s cryptographic 
approach to secret sharing, these pseudonyms are actually shares, which can 
be used for reidentification later on, if justified by sufficient suspicion. 

— Sufficient suspicion is defined by a threshold on weighted occurrences of 
potentially harmful behavior, and, accordingly, by a threshold on shares 
belonging together. Exceeding such a threshold results in the ability to reveal 
the secret, i.e. to recover the identity behind the pseudonyms. 

— Thus the analyzers, holding the pseudonymized audit data including the 
shares, can reidentify the data if and only if the purpose of surveillance 
requests to do so. 

— Since shares are mixed together, we have to take precautions that shares are 
not inappropriately combined leading to blame innocent persons. 

Our approach follows the paradigm of multilaterally secure systems that take 
into account security requirements of all involved parties and balance contrary 
interests in an acceptable way [1]. Conventional approaches to securing audit 
prioritize the protection of interests of IT systems owners, emphasizing account- 
ability, event reconstruction, general and security related problem and damage 
assessment and last but not least deterrence. On the other hand, also the inter- 
ests of other parties are affected, in particular unobservability, anonymity and 
unlinkability, since audit analysis deals with personal data including but not 
limited to activity subjects, objects and their owners, results, origin and time. 

Once audit data is available in a processible form, it is hard to enforce ac- 
countability of processing, for it can be performed on copies of the data exter- 
nally to the domain under control. The high complexity, composability and short 
life cycles of todays IT systems render the application of available verification 
techniques to these systems impractical. It is exceedingly difficult to rule out 
that an IT system does not contain a trojan horse. It is therefore insufficient to 
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legally prohibit activities necessary for large-scale surveillance. In fact, there is 
an increasing demand for technical enforcement of privacy principles, reflected 
by emerging approaches for various applications (see also [2]). In our context, 
pseudonymization is the tool to do so. 

The rest of the paper is organized as follows. First we review the European 
legislation (Sect. 2), identify the paradigms regarding pseudonymization (Sect. 3) 
and survey related approaches (Sect. 4). Then we present our approach in more 
detail, specifying the audit architecture (Sect. 5) and the application of secret 
sharing to pseudonymization (Sect. 6). We conclude with an outlook on issues 
for further investigation (Sect. 7). 

2 European Legislation 

As summarized in consideration (1) in [3] the objectives of the European Com- 
munity (EC) include “ensuring economic and social progress by common action 
to eliminate the barriers which divide Europe, encouraging the constant improve- 
ment of the living conditions of its peoples, preserving and strengthening peace 
and liberty and promoting democracy on the basis of the fundamental rights rec- 
ognized in the constitution and laws of the Member States and in the European 
Convention for the Protection of Human Rights and Fundamental Freedoms” . 
Accordingly IT systems must respect the right to privacy (see consideration (2) 
in [3]). From Articles 25 and 26 in Chapter IV “Transfer of Personal Data to 
Third Countries” and from Article 4 derives the relevance of the directive and 
the respective national laws for third countries regarding export and processing 
of personal data from and on EC territory, respectively. 

2.1 Fundamental Principles for the Protection of Personal Data 

While the directive amalgamates different traditions in dealing with protection 
of personal data, we rather work out the fundamental principles in a discourse 
on relevant German law. Important fundamental principles can be found in a 
sentence [4] of the German Constitutional Court (“Bundesverfassungsgericht”), 
which postulates the informational self-determination. Collecting and process- 
ing personal data without consent of the data subject constitutes a restriction 
of the data subject’s right on informational self-determination. Such restrictions 
require a statutory foundation meeting two criteria: the first implies that prereq- 
uisites and scope of restrictions are clearly identifiable, but more importantly, 
restrictions are limited to the amount strictly necessary to achieve the objec- 
tives supporting the restrictions. The scope and nature of personal data being 
considered is determined by the objective and the resulting restrictions. As the 
sentence emphasizes, the sensitivity of data depends not merely on its nature, 
rather on the pursued objective and on its linkability and processibility. 

Accordingly, measures which restrict the data subject’s ability of informa- 
tional self-determination, such as audit and audit analysis do, must be adequate, 
relevant and not excessive (read minimal) with respect to the goals, which are 
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proportional with respect to the audit data sensitivity. The principles of data 
avoidance and data reduction derive from this fundamental principle of neces- 
sity. To fix sensitivity, the objectives are fixed beforehand, and consequently the 
nature and scope of data being collected as well as the nature of processing re- 
sults (analysis) have to be fixed. This fixation is a fundamental privacy principle 
and henceforth is referred to as purpose binding. 

2.2 Instruments for the Enforcement of Privacy 

From some important European and German directives, acts, ordinances and 
sentences [5, 4, 6, 3, 7, 8, 9, 10] and [11] we isolated common instruments for the 
protection of personal data. We found many of the instruments to be relevant for 
audit analysis. The most important ones are summarized in the following. There 
are several claims concerning the data quality. Here we subsume the fundamental 
principles of necessity and binding to a legal purpose and the derived demand for 
data avoidance and data reduction. We can avoid personal data by not collecting 
it in the first place or by anonymizing it. The German TDDSG, [7], Article 
2, explicitly allows profiling of teleservice users only under the condition that 
pseudonyms are used and prohibits correlating profiles with personal data of 
the data subject. Data reduction can be achieved by discarding personal data 
as soon it is not needed anymore for the purpose, or by specifying mandatory 
timeouts for data disposal. 

Apart from several exceptions, processing personal data is legitimate only 
when the data subject has unambiguously given his consent. Systems therefore 
must be able to request and store freely given, specific and informed indica- 
tion of his wishes by which the data subject signifies his agreement. Gontrollers 
are obliged to inform data subjects before, or notify them after collection, re- 
spectively, about certain facts and rights. This can partly be carried out by 
the system. Data subjects have the right to choose acting anonymously within 
a system, unless laws collide. Referring to the instruments of information and 
notification, the data subjects must be enabled to make informed decisions re- 
garding their options. Default settings should choose the maximum achievable 
anonymity and settings should allow for differentiation of various transaction 
contexts. Systems need to be designed such that data subjects are not subject 
of solely automatic decisions producing legal effects concerning them. See [3], 
Article 15 for details. 

The controller must implement appropriate technical and organizational mea- 
sures to protect personal data having regard to the state of the art and the cost 
of their implementation, ensuring a level of security appropriate to the risks 
represented by the processing and the nature of the data [3], Article 17. 

Some regulations apply directly to audit analysis for the purpose of mis- 
use and anomaly detection. For example, the German Data Protection Act [5], 
§14(4), §31 and its forthcoming amendment [6], §14(4), §31 include explicitly a 
strict purpose binding for IDS. While the act in force [5], §19(2), §33(2)2, §34(4) 
allows refusing data subjects to access IDS related personal data concerning 
them, the draft amendment [6], §19(2), §33(2)2, §34(4) does so only under the 
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condition that the access generates an inappropriate effort. One could still try 
to apply §19(4)3 and §33(2)3, respectively. 

In the EC directive [11] processing of traffic and billing data is allowed for per- 
sons handling fraud detection. In the German ordinance [8] based on act [9] the 
carrier and the provider of telecommunication services may analyze one month 
worth of the call database to pick suspicious calls [8] (2) . They may then collect 
an analyze customer and call data relating to aforementioned suspicions [8], (1) 
2. If it is necessary in a given instance, message content may be collected and ana- 
lyzed [8] (4). The responsible supervisory authorities and the data subjects must 
be notified of any analysis of the complete database or of message content. [8] (3). 
A recent amendment [10] reduces the obligatory notification to establishing and 
modifying an analysis system and increases aforementioned time period from 
one to six months. 

3 Paradigms Regarding Pseudonymization 

Multilaterally secure systems balance interests of various actors. Henceforth we 
call representatives of the system’s owners, also responsible for the system’s 
security, system security officers (SSO). To balance their power against privacy 
interests of users, they sometimes would have to cooperate with personal data 
protection officials (PPO). 

An entity is called anonymous if an attacker cannot link its identity with its 
role or activities within an anonymity group. While unlinking events from identi- 
ties by means of pseudonymization as a kind of (reversible) data avoidance is just 
one method among others for achieving anonymity, we focus on pseudonymiza- 
tion here. Most data protection acts demand a reasonable strength of anonymity 
in consideration of the effort required by an attacker and of the risk represented 
by the processing and nature of the data. We consider some criteria determining 
the strength of anonymity achieved. 

The attacker model specifies against whom anonymity is achieved, and 
whether it persists if any parties, participators or not, compile and correlate 
information. A related design decision is when to pseudonymize. Personal data 
shall be pseudonymized before entering domains monitored by adversaries. The 
earlier collected data is pseudonymized, the larger hostile domains can be han- 
dled. 

The existence of an instance with the ability for reidentification is an im- 
portant factor influencing strength of anonymity. The answer is affected by the 
choice of method for pseudonymization as well as who introduces pseudonyms. 
Both determine the participant (s) needed to cooperate for reidentification. Rei- 
dentification should be independent from the anonymous entities but be in- 
feasible for adversaries. Note that an IDS and associated personnel (SSOs) is 
regarded as adversary to the entities’ privacy, therefore necessitating IDS being 
constructed in a way either requiring technically enforced cooperation of PPOs 
or analysis and results being exclusively confined to the purpose of intrusion 
detection. In absence of techniques enforcing such a kind of restriction to a pur- 
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pose, one can scrutinize and certify the functionality of the implemented audit 
analysis component. Our contribution offers a method for technical enforcement 
of purpose binding. 

Parameterization influencing reidentification as well as quantity and qual- 
ity of profiling should be transparently coordinated with all involved parties 
or their representatives. Accordingly audit analysis needs to be carried out on 
pseudonymized data. Another conclusion demands strictly separating domains 
controlled by IDS and associated personnel from domains where personal data 
is processed in the clear. Appropriate organizational as well as technical mea- 
sures need to be taken and made transparent to all involved parties or their 
representatives . 

Another indicator for the strength of anonymity is the cardinality of the 
anonymity group. An attacker should experience greater losses than gains when 
damage by reidentification is done to all members of the anonymity group instead 
of just one entity. The cardinality of the anonymity group is bounded by the 
range of valid values for the features being pseudonymized. In addition utilizing 
a bijection as pseudonymization method is insecure for scarce anonymity group 
populations. In that case it seems advisable choosing an injection. 

One more pertinent criterion is the linkability of activities with the same 
person. The selection of what features being pseudonymized and the class of 
pseudonyms being utilized substantially influence linkability. For a discussion 
about user identifying features and which features to pseudonymize refer to [12]. 
Note that design has to consider the degree of linkability required by the audit 
analysis method. If certain data cannot be analyzed in pseudonymous form, 
strength of anonymity needs to be traded off against quality of analysis. 

An entity has one identity but may use one or more pseudonyms. We con- 
sider various classes of pseudonyms differentiated by reference and achieving 
increasing degrees of anonymity [13]: Entity-based pseudonyms may be public 
or non-public or anonymous. Role-based pseudonyms can be relation-based or 
transaction-based. 

Finally we always have to keep in mind, that we don’t need to pseudonymize 
data we don’t collect. Consequently audit shouldn’t collect and permanently 
store data not strictly necessary for the purpose of analysis, eg. intrusion detec- 
tion. Another general point is that false positives are not only a nuisance for site 
security officers (SSO), they are also affecting the users’ privacy due to undesir- 
able reidentifications [14, 15]. We therefore propose the following framework for 
reidentification handling: 

Each reidentification is audited with the respective reason including infor- 
mation available and necessary for an SSO to investigate the validity of the 
reidentification. Such investigations may encompass taking note of statements 
about certain aspects of user behavior. Such statements most certainly are sen- 
sitive personal data. The SSOs should be technically restricted to reception of 
statements about aspects of behavior seeming to be responsible causes for the 
anomaly to be investigated. On grounds of the investigation results an SSO 
decides whether the affected user can be informed about the reidentification 
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without compromising security. This would be the case for false alarms. In case 
the results raise suspicion that it would be advantageous not to inform the user, 
the SSO would mark the reidentification audit record as deferred and would have 
to document the reason associated with the record. The PPOs could be notified 
when an SSO defers a reidentification record and they must be able to contem- 
porarily review all deferred records together with the respective reasons. PPOs 
thus can attend to their control function and could solve conflicts resulting from 
SSO misbehavior externally to the system. Many parts of the framework could 
be enforced by the design of the IDS. 

4 Related Approaches 

We evaluated four related approaches: the Intrusion Detection and Avoidance 
system (IDA) [16, 17, 12], the Adaptive Intrusion Detection system (AID) [18, 
19, 12, 20, 21], a Firewall audit analyzer (FW) [14, 15], and the Aachener Net- 
work Intrusion Detection Architecture (ANIDA) [22], as well as our approach 
Pseudonymizer with Conditional Reidentification (Pseudo/CoRe). 

We used several criteria, derived from the paradigms above, to capture the 
privacy properties of the systems. Table 1 shows, whether the systems have 
special provisions for data avoidance and data reduction, whether a part of the 
system architecture itself enforces the separation of domains under control of 
SSOs or PPOs, respectively. We denote the class of pseudonyms in use and how 
cumulation of information regarding entity-based or relation-based pseudonyms 
is limited, and which method is applied for generating pseudonyms, whether 
there are explicit identity mapping data structures and whether they are avail- 
able to the SSO in protected form, whether pseudonyms are introduced locally on 
the machine, where the audit is generated, or elsewhere. Regarding after which 
point of the activity flow audit data is pseudonymized allows to infer in which 
domains it remains unprotected. It is furthermore important to know, whether 
a system pseudonymizes merely directly identifying features or also other po- 
tentially identifying features, if the analysis establishes linkability of activities 
via pseudonyms or other features, and whether access control in the monitored 
system is based on identifying features. An extremely important area is the rei- 
dentification handling: whose cooperation is required for an SSO to reidentify an 
entity, and which entity ensures the binding of reidentification to the purpose 
of intrusion detection. If an SSO requires no cooperation from a PPO or her 
technical representative, then it is of utter importance, that there are technical 
provisions for purpose binding, controlled by a PPO. There are two cases in 
AID, the first concerning real-time intrusion detection, the second relating to 
after-the-fact review of archived audit records. 

5 Audit Architecture 

Most modern operating systems offer similar audit components. No matter 
whether audit data is generated using AIX, BSDs, HP-UX, Linux, Solaris, Win- 
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Table 1. An overview of the privacy properties of related approaches. ‘%’ de- 
notes unavailable information, ‘()’ brackets indicate the controlling entity, ‘n.a.’ 
means not applicable 



criteria 


IDA 


AID 


FW 


ANIDA 


Pseudo/ 

CoRe 


data reduction 


no 


no 


no 


anomaly fil- 
ter 


n.a. 


data avoidance 


no 


no 


no 


client IP 


n.a. 


features 


subjects 


identifying 

features 


subjects, 

hosts 


subject 


configurable 


pseudonyms 


entity 


entity 


entity 


relation/ 

transaction 


transaction 


cumulation limitation 


rekeying 


rekeying 


remapping 


no need 


no need 


method 


symmetric 

encryption 


symmetric 

encryption 


sequence 

numbers, 

NAT 


% 


secret shar- 
ing 


introducer 


local 


local 


local 


TTP 


local 


pseudonymization 


in reference 
monitor 


after So- 

laris BSM 
auditd 


after fire- 
wall audit 


after TTP 
login 


after 

e.g. syslogd 


analysis linkability 


pseudonyms 


pseudonyms 


pseudonyms 


group IDs 


n.a. 


access control on 


pseudonyms 


identities 


identities 


group IDs 


identities 


identity mapping 


implicit, 

SSO 


implicit, 

SSO 


unprotected 

SSO 


%, TTP 


random, en- 
crypted, 
SSO 


reidentification 


manually 


automatic 


manually 


% 


automatic 


cooperation 


PPO 


none / 

PPO 


none 


TTP 

(PPO) 


none 


purpose binding 


PPO 


analyzer 

(SSO)/PPO 


no 


PPO 


pseudony- 

mizer 

(PPO) 


domain separation 


no 


no 


no 


yes 


yes 



dows NT or another operating system, the identity of the acting subject is part 
of the information being recorded [23, 24]. In particular operating systems de- 
signed for compliance with the Trusted Computer System Evaluation Criteria 
(TCSEC) [25, 26] or the Common Criteria (FAU_GEN1 . 2 and FAU_GEN2) [27] have 
to provide audit information identifying a security relevant event’s originator. 

To substantiate placement of a pseudonymizer we chose Unix/Solaris as an 
exemplary platform. Figure 1 depicts an abstraction of components relevant for 
auditing at kernel and user level. Basically we find two auditing situations. Some 
audit streams go directly into dedicated files, others are funneled through central 
service points, syslogd and auditd respectively, adding certain supplementary 
information before merging and/or distributing the streams. 
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Fig. 1. Solaris audit facilities and placement opportunities for pseudonymizer 
components 



The audit daemon auditd is part of the SunSHIELD Basic Security Module 
(BSM) [28] which is included with Solaris starting with release 2.3. BSM is 
intended to supplement Solaris for TCSEC C2 compliance and is not activated 
by default. Audit events may be delivered to auditd by the kernel and by user 
level applications. 

BSM Kernel level events comprise system calls (see ‘Application 1’ in Fig. 1) 
which allow for potentially security sensitive actions. A few select user level 
applications included in the Solaris release emit events via auditd, such as some 
local and remote login services, passwd, inetd, mountd, etc. 

As auditd, also syslogd collects event records from kernel and user level 
sources. Mostly optionally syslogd also accepts remotely generated event 
records at UDP port 514. For management purposes syslog also handles event 
classes differentiating event sources and event priorities, referred to in syslog par- 
lance as facilities and severity levels respectively. The event to class mapping is 
determined by the event sources, syslogd’s behavior concerning incoming events 
is configurable wrt. the events’ classification and priority. Events can be ignored, 
sent to files, devices, named pipes, remote hosts or to the console for selected 
or all users. Except for the event header the record format is determined by the 
event source. 

We differentiate three categories of input data for audit analysis: Host-based 
data is derived from sources internal to individual hosts, including some types 
of events concerning networking, network-based data derives from network as- 
sociated sources, and out-of-band data is derived from other sources. While 
expanding on different data categories is helpful for classifying audit analysis 
(eg. intrusion detection) systems [29, 30], we were interested in covering with one 
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non-invasive approach to pseudonymizer placement as many existing sources as 
possible. 

All mentioned event data categories are represented in Fig. 1. Host-based 
data is collected in storage components labeled ‘Host-Audit’, ‘Accounting’ and 
‘App.-Audit’ for application audit. The respective sources are applications and 
the kernel including the BSM audit module and the accounting component. 
Network-based data also is extracted in a kernel module, in our case using ip 
filter [31] . The packet audit data path in Fig. 1 actually is a simplified illustration. 
ip filter, as commonly set up, sends audit events via ipmon to syslogd. The 
data path of other suitable products may vary. Acquisition of out-of-band data 
is usually implemented by means of applications querying other sources, they 
are thus represented in Fig. 1 by applications. 

5.1 Choosing an Audit Format for the Prototype 

For our first prototype we chose to support the syslog audit format. The available 
audit facilities use various event record formats. For our prototype we wanted to 
initially support the record format featuring the widest applicability and the least 
limitations. There have been proposed some common audit formats [32, 33, 34] 
and many IDS use their own canonical format, but in the past these formats have 
not been taken up on a large scale. Event sources using their own audit files ei- 
ther use a format similar to syslog or an own record format. In the first case, we 
treat them as we treat all syslog clients. Implementing a prototype specialized 
to handling the latter case would limit applicability substantially. Accounting 
monitors the utilization of shared system resources. Its record format is very sim- 
ilar on all Unixes, involving low recording overhead. An audit record is emitted 
after the respective event terminates, resulting in accounting records for daemon 
processes normally being withheld. Thus accounting will not generate privacy 
problems for networked service clients, but sensitive information regarding lo- 
cal users is recorded. The sensitivity of network packet audit data depends on 
their content’s detail level. When network packets contain external knowledge or 
application level data, pseudonymization of packet audits should be considered. 
As mentioned above, TCSEC C2 audit record formats vary between implemen- 
tations of diverse vendors. Tailoring the prototype to BSM audit records would 
limit its applicability. Anyhow TCSEC C2 audit implementations, particularly 
Sun’s BSM, are quite popular with host-based IDS, which potentially can impair 
the privacy of users associated with a login session. 

Last but not least the centralized syslog audit facility is widely utilized by the 
kernel and by user level applications for auditing, owing to its uniformity and 
availability in all significant Unixes. It has its uses in troubleshooting and limited 
manual intrusion detection. While only serious actions of users associated with a 
login session appear in syslog events, many network services audit via syslog with 
varying levels of detail. TCP wrappers on hosts with huge login user populations, 
busy web sites and mail exchangers can generate remarkable amounts of audit 
records containing personal data. 
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5.2 Embedding Pseudonymizers 

Providing pseudonymization directly at the source of audit events, namely in the 
kernel or user level applications, does not scale well, for all event sources would 
have to be modified. Then event data should be pseudonymized somewhere on 
its way from the event source to the event sink, but before it enters domains con- 
trolled by expected adversaries. It is thus in most cases not sensible to integrate 
pseudonymization with the event sink (audit file or audit analyzers). 

For auditing by means of both, auditd and syslogd, system call inter- 
faces are provided. Using wrappers for the appropriate auditing system calls, 
pseudonymization of audit data can be performed before the data enters the 
daemons. Another solution is anchoring pseudonymization directly in the dae- 
mon code. While these approaches are feasible for Solaris, as its source code is 
available, this may not be the case for other platforms. 

A third approach is pseudonymizing the output event streams of auditd and 
syslogd. This can be achieved rather directly by having the daemons audit into 
pipes than indirectly via temporary files as depicted in Fig. 2. 




Fig. 2. A pseudonymizer and a redirector 



This solution is also applicable to sources emitting events directly to files 
without using central auditing services. For manageability and technical reasons 
it can be profitable to further audit centralization. If we wanted ‘Application 2’ in 
Fig. 1 to audit to syslogd instead of the ‘App.- Audit’ file, we needed a redirector 
as in Fig. 2, which picks up the audit stream as the pseudonymizer does, and uses 
the syslog call interface to deposit the events. In case an application executes 
accesses on its audit files beyond appending events, the redirector could only be 
used by indirection via files. Those files shouldn’t be pruned while the application 
might revisit old contents. 

5.3 Pseudonymizing syslogStyXe Audits 

The pseudonymizer shall receive the input audit records from syslogd and while 
pseudonymizing shall be able to differentiate audit records belonging to different 
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event types, cluster audit records belonging to the events that always occur 
conjointly, associate events with different attack scenarios, and prioritize events 
within an attack scenario. 

The pseudonymizer accomplishes these requirements based on knowledge pro- 
vided by its administrator, preferredly a PPO. He specifies the syntax of event 
types and related features as well as memberships of event types in event type 
clusters. For brevity we will subsequently abstract clustered event types into a 
new event type containing the same information, particularly the same features, 
as the cluster members. PPOs, reasonably in collaboration with SSOs, specify 
which events occur in which attack scenario and the priority of their features’ 
contribution to the attack. The same knowledge is made available to reidentifiers. 

In order to pseudonymize syslog audit records, we have to analyze and parse 
the respective format. Parsing is based on recognizing some basic concepts, which 
are defined as follows. Recognition of concepts is based on the evaluation of 
syntactical context, eg. by means of regular expressions for pattern matching. 

I Oct 20 20:48:29 | | pony | identd [22509] : token TWpldDm02sq65Ff Q82zX == uid 1000 (deedee) 

The first framed field in the record above specifies the time and date when 
syslogd handled the record, which is the symptom of an event having occurred 
in an application process which resides on the host denoted by the other framed 
field. For the sake of presentation, we will henceforth omit time/date and host 
fields from audit records. The framed fields in the record below specify the 
name of the facility and optionally the facility’s process ID. Facilities can be 
applications or kernel components. 



identd 



[ 22509 ] : token TWpldDm02sq65Ff Q82zX == uid 1000 (deedee) 



I su |: BAD SU deedee to root on /dev/ttypl 



The event type context (framed below) of a record uniquely specifies a type 
of event specific to a given facility. We define an event type as a combination of 
a facility and an event type context. 



identd [22509] : 



token 



TWpldDm02sq65FfQ82zX 



uid 



1000 (deedee) 



BAD SU 



deedee to root | on | /dev/ttypl 



An audit record of a specific event type may contain a number of features 
identifying entities. Features like user IDs or user names may directly identify 
entities; others may be indirectly identifying features [12]. Below some identifying 
features are framed. 



identd [22509] : token TWpldDm02sq65Ff Q82zX = 


= uid 


1000 


( 


deedee ) 


su: BAD SU 


deedee 


to 


root 


on 


/dev/ttypl 









A feature type is uniquely specified by its feature type context, which is specific 
to the event type of the audit record. We define a feature type as a combination 
of an event type and a feature type context. Each feature is understood as an 
instance of a feature type. Below the context of the feature 1000 of the type 
‘identd token user ID’ is framed. 
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identd [22509] : 



token TWpldDm02sq65Ff Q82zX == 



uid 



1000 



[~(~|deedee) 



The basic idea of our approach is cryptographically enabling reidentification 
if an entity has caused at least as many possibly attack related events as specified 
by a threshold. Since various attack methods produce differing numbers of audit 
records of various event types, we allow for the grouping of event types, and for 
assignment of an own threshold tg to each group. It may be the case that different 
feature types of an event type contribute with a specific weight to different attack 
scenarios. We therefore actually assign feature types to groups Ig, which represent 
the different attack scenarios Ag. In addition we assign a weight function Wfg{) 
to each feature type /, representing the priority of /’s specific contribution to Ag. 

The knowledge about event types and attack scenarios is thus represented by 
triplets {f,Ig,WfgO) per each feature type /. Each triplet expands to a 5-tuple: 



(facility, event type context, feature type context, Ig, w/g()) 

There may be multiple occurrences of the identity of the same entity in 
different feature types of one or more audit records. We thus can associate the 
same entity, though not the same feature type, with different attack scenarios. As 
a result, pseudonyms of a specific entity, which are delivered from the same Ig, 
contribute to the same attack scenario and therefore contribute to reach tg . This 
is not the case for pseudonyms of a specific entity, which are delivered from 
different Ig's. 

The tuples defined above form a tree with a virtual root connected to all 
specified facilities. Each facility is connected to the event type contexts contained 
in tuples with the same facility. Likewise each event type context is connected to 
all feature type contexts contained in tuples with the same facility and event type 
context. Ig and Wfg{) are stored with their respective feature type context. For an 
incoming audit record the pseudonymizer determines in the tree the matching 
facility. Next, in the subtree of the matched facility the matching event type 
context is determined. Finally, the matching feature type context is determined 
in the subtree of the event type. 

The pseudonymizer performs an identity update. It isolates the feature or 
identity, respectively, retrieves the data structure denoted by Ig and checks, 
whether the identity is already a member of Ig. In case it is not, an entity 
entry for the identity is allocated and initialized in Ig. Finally it generates w/gQ 
pseudonyms for the identity and in the audit record replaces the identity by 
the generated pseudonyms. The higher the priority of a specific feature type’s 
contribution to Ag, according to w/gQ, the more pseudonyms are generated for 
the identity. 

Similarly proceeding, reidentifiers can relate incoming pseudonyms with their 
respective Ig. 
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6 Applying Secret Sharing to Pseudonymization 

The basic idea of our approach is to have a pseudonymizer, acting as representa- 
tive of the anonymity group of all its clients, split an identifying feature idi^ into 
as many pseudonyms as are needed to pseudonymize audit records containing 
idig, at maximum^ P —1 pseudonyms. The pseudonyms shall have the property, 
that given any tg, but not less, pseudonyms of idt^ taken from pseudonymous 
audit records, a reidentifier is able to recover idi^. Secret sharing schemes are 
suitable to fulfill these requirements. 

For our purposes we exploit Shamir’s threshold scheme, as described in detail 
in [35] with some modifications. Shamir’s threshold scheme has some desirable 
properties: it is perfect, ideal and it does not rely on any unproven assumptions. 
New shares can be computed and issued without affecting preceding shares, and 
providing an entity with more shares than others, bestows more control upon 
that entity. 

6.1 Deviations from Shamir’s Threshold Scheme 

Owing to different conditions of deployment of Shamir’s threshold scheme, we 
make some modifications with regard to its application. Firstly, in our scenario 
we don’t have a group of participators, of which each confidentially receives 
and stores one or more shares, until some of them pool tg shares for secret 
recovery. Instead, we have one or more reidentifiers, each of them receiving all 
shares. A reidentifier is always in the position to recover the secret, which is 
the identity idi^ associated with a polynomial Pig{x), as soon as it received tg 
compatible shares, which are the pseudonyms of idi^. Additionally, since from 
the point of view of the pseudonymizer all reidentifiers are potential adversaries, 
the confidentiality requirements regarding shares cease to apply. 

While in conventional applications of secret sharing schemes it is feasible to 
estimate the number of shares preliminarily to be issued, the same is impractical 
in our scenario, for it is unknown which identities in the near future will require 
pseudonym generation. We thus take a stepwise approach to the choice of x- 
coordinates and to share generation, and we distribute Pig{x) paired with its x. 
We have to preclude linkability wrt. the ^-coordinates of shares within a group of 
feature types. Accordingly we choose unique ^-coordinates for shares of identities 
being member of the same Ig. 

Normally certain products of pairwise combinations of the x-coordinates of 
shares pooled for secret recovery can be precomputed and issued to the partic- 
ipants in order to improve the performance of secret recovery using Lagrange 
interpolation. Since this optimization is impractical in our stepwise approach, 
we do not use it. 

^ The secret sharing scheme we apply operates over GF{P). 
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6.2 Mapping Secrets to Identities 

As the assignment of identities to different feature type groups Ig and respective 
thresholds tg is just a matter of applying rules specified by an administrator 
(see Sect. 5.3), for our forthcoming discussion we regard a given Ig and omit 
the index g wherever applicable. In our approach we do not share identities 
directly. Instead we assign a unique secret wrt. I to each identity and are going 
to share this secret. For reidentification after secret recovery we need a function 
(or table) that maps secrets to identities, and in doing so, fulfills some basic 
security requirements: 

— Since the SSOs using the reidentifier need to control the host on which the 
reidentifier operates, and since SSOs in our attacker model are potential 
adversaries, we cannot rely on the Unix operating system to protect the 
confidentiality of the mapping. The function has to provide itself for its 
confidentiality. 

In its basic version the mapping from secrets to identities is accomplished by 
storing cryptograms of the identities, regarding the decryption key ki as the 
secret, matching the encryption key ki, pseudo-randomly chosen and unique 
wrt. I. After an identity update (see Sect. 5.3), I is to be made available to 
the reidentifiers. 

— Once reidentified, an identity shall not be indefinitely linkable to forthcoming 
events caused by the respective entity. We have to trade off the anonymity of 
entities against the linkability of events. Since linkability is required by au- 
dit analysis to some degree, we carefully have to trade off anonymity against 
accountability. Other parameters include but are not limited to performance 
and storage requirements for handling growing numbers of shares and lim- 
itations to the number of issueable shares, and the performance penalty 
imposed by limiting measures as well as their consequences. 

To limit linkability we change the mapping after expiry of an epoch. During 
an epoch, identities are not deleted from the mapping. After expiry the map- 
ping is discarded and rebuilt on demand during the next epoch. Alternatively 
appropriate rekeying of all identities could be performed. 



6.3 The Mismatch Problem 

Provided the pseudonymizer issues shares for idi as unmarked pairs {x,pi{x)), 
then a reidentifier cannot determine which shares within a group I belong to the 
same identity, unless it tries to recover an identity from each combination of t 
shares. 

While ‘blindly’ choosing combinations of t shares, a reidentifier will not al- 
ways draw t shares stemming from the same polynomial. Consider a reidentifier, 
which, as depicted in Fig. 3, chooses three shares stemming from three different 
polynomials P 2 ,Ps and p^. The solution p* of the linear equations in this case 
matches in s* = p*{0), the secret of pi. If the reidentifier recovers the respective 
identity idi, correct accountability could not be established. Accordingly, if the 
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solution of a combination of t shares matches a secret Si in I, it is denoted as a 
valid match if all shares are compatible, i.e. stem from the same polynomial pi, 
otherwise it is called a mismatch. 




Fig. 3. A mismatch: the solution s* of a combination of t shares, not all stemming 
from the same polynomial, matches a secret si, though it should not 



6.4 Tackling the Mismatch Problem 

A straightforward approach to the mismatch problem discussed in Sect. 6.3 is 
to supply reidentifiers with information enabling them to determine, whether 
two shares stem from the same polynomial. One, but not the best, method to 
accomplish this is marking all shares stemming from the same polynomial pi, 
and thus from the same identity idi within I, with an identical label. 

This approach has a weakness concerning the unlinkability of audit records. 
Audit records caused by the same identity in a feature type group I, are link- 
able wrt. to the labels, even before t pseudonyms have been issued. In other 
words: if we always mark them, we have relation-based, not transaction-based 
pseudonyms. Subsequently we suggest and compare three further approaches to 
mismatch handling avoiding this problem. 

Mismatch Avoidance by Deferred Marking. Instead of immediately letting rei- 
dentifiers know which shares are compatible, we can defer this information un- 
til t compatible shares have been issued. Before this condition holds, reidentifiers 
can’t recover the respective secret, anyway. 

Mismatch Avoidance without Marking. Instead of marking shares, the pseudony- 
mizer can take precautions, altogether avoiding the occurrence of mismatches. To 
enable the pseudonymizer and reidentifier to check for matching combinations 
of t shares, we have to provide a verifier with each entity in I. In its basic 
version, a verifier is just the value of a secure one-way hash function of the 
secret, e.g. a message digest. Before issuing a share, the pseudonymizer checks, 
if all combinations of t incompatible shares, each including the new share, are 
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free of mismatches. If a mismatch is found, the share is discarded and another 
one is chosen and tested. Since the reidentifier cannot look at the labels to find 
compatible shares, it has to test if any combination of t shares matches, each 
including the incoming share. 

Mismatch Verification without Marking. We have to keep in mind, that pseudo- 
nymizer always need to reside on production machines to be able to protect 
the user’s privacy. It is thence imperative to minimize the performance penalty 
imposed upon the machine by the pseudonymizer. Reidentifiers instead could 
use dedicated machines. We can modify the aforementioned mark- free approach 
by not having the pseudonymizer avoid mismatches, but having the reidentifier 
verify matches with the pseudonymizer ’s help. The pseudonymizer in this case 
takes no precautions to avoid mismatches and just issues unmarked pseudonyms. 
As in the approach above, the reidentifier ‘blindly’ searches for matching com- 
binations of t pseudonyms. It then verifies the validity of a match by providing 
the ppseudonymizeeudonymizerseudonymizer with I, the share-combination, a 
nonce and the secure one-way hash value of the concatenation of the nonce and 
the matching solution. The peudonymizerseudonymizer accepts the query if the 
nonce is fresh, searches the data structure denoted by I for the appropriate en- 
tity, and uses the respective polynomial to test whether the supplied shares are 
compatible. 

In Table 2 we give a brief overview of all approaches. The rows refer to 
the properties being compared: privacy refers to the linkability of audit records 
from an entity in a specific attack scenario before t shares have been issued. If 
records are unlinkable, we assign a ‘-I-’; if offline validation can be implemented, 
a ‘-I-’ is assigned; the other rows denote the performance penalty imposed by 
the pseudonymizer ‘P’ and the reidentifier ‘R\ respectively. It can be seen, that 
the approach mismatch avoidance by deferred marking has the most desirable 
properties. 



Table 2. A comparison of all proposed mismatch handling approaches 





avoidance 
by marking 


avoidance 
by deferred 
marking 


avoidance 
no marking 


verification 
no marking 


privacy 


- 


-f 


-f 


-f 


offline validation 


-f 


-f 


+ 


- 


performance penalty P 


-I- low 


-|- low 


- high 


+ low 


performance penalty R 


-I- low 


-|- low 


- high 


- high 
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6.5 Enhanced Mapping Requirements 

Hitherto we regarded reidentifiers as potential adversaries and equated them 
with other attackers. In some environments it might be profitable to form further 
obstacles for attackers being no reidentifiers. 

When using differentiated protection areas, initially we perform a secure key 
exchange, such that both, pseudonymizer and reidentifier, know an encryption 
key fee and an optional symmetric key The reidentifier knows the respective 
decryption key kg. Secrets being shared then are not defined as the keys ki 
needed for decrypting the identity cryptograms. Instead, ki is encrypted under 
kg to form a secret. In case no labels are used, verifiers not merely are the one- 
way hash value of the secret, but the value of a keyed one-way hash- function 
under ky, applied to the secret. Note, that the effect of using is insignificant, 
since an attacker who is unaware of kg, in any case is unable to decrypt the 
identities. 

Since we use a kind of transaction-based pseudonyms, it would be advanta- 
geous to hide the number of actually involved entities, in order to make it harder 
for an adversary to apply external knowledge. This cannot be achieved under 
immediate marking for mismatch handling. In a nutshell, we introduce dummy 
entries in I, which are treated like entries for real identities, except that they 
are annoted as dummies. 



7 Open Issues and Further Research 

There are a number of promising areas for future research related to our ap- 
proach. Regarding anonymity we are interested in further methods for reducing 
the granularity of reidentifications, as well as potential benefits and required 
properties of dummy audit records. 

In addition we will focus on intrusion detection related issues. Our approach 
is extendible to allow associating a specific feature type with more than one 
group, which may be needed for attack modeling. Additionally we may consider 
extending a priori knowledge rules by a condition to differentiate internal and 
external instances of feature types. This would allow us to provide different 
thresholds and weights for internal, and external identities, respectively. 

Another promising area is the derivation of pseudonymization parameters 
from attack contexts extracted from knowledge data in intrusion detection sys- 
tems. Exploiting secret sharing access structures for modeling the weight func- 
tions WfgO seems auspicious. Other directions worth investigating are adapta- 
tion of the cardinality of anonymity groups to the attention level of the analysis 
component, linkability and other issues concerning epoch transition. 

Further investigation is required considering one or more adversaries coor- 
dinating the attack over several machines and user accounts. A single attacker 
using accounts of identical names on several machines could be handled by a 
centralized pseudonymizer. In case different user accounts act in concert, cor- 
relation must be carried out using features other than account names, as in 
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non-pseudonymous intrusion detection. Existing techniques, such as connection 
fingerprinting, would have to be extended for pseudonymity. 

A final area of investigation is the complementation of audit pseudonymiza- 
tion with detectability of integrity loss. This might be achieved by marrying our 
approach with the one of Schneier and Kelsey [36, 37]. 
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Abstract. As the recent distributed Denial-of-Service (DDOS) attacks 
on several major Internet sites have shown us, no open computer network 
is immune from intrusions. Furthermore, intrusion detection systems 
(IDSs) need to be updated timely whenever a novel intrusion surfaces; 
and geographically distributed IDSs need to cooperate to detect dis- 
tributed and coordinated intrusions. In this paper, we describe an exper- 
imental system, based on the Common Intrusion Detection Framework 
(CIDF), where multiple IDSs can exchange attack information to detect 
distributed intrusions. The system also includes an ID model builder, 
where a data mining engine can receive audit data of a novel attack from 
an IDS, compute a new detection model, and then distribute it to other 
IDSs. We describe our experiences in implementing such system and the 
preliminary results of deploying the system in an experimental network. 



1 Introduction 

As network-based computer systems play increasingly vital roles in modern soci- 
ety, they have become the targets of our enemies and criminals. The security of 
a computer system is compromised when an intrusion takes place. An intrusion 
can be defined as “any set of actions that attempt to compromise the integrity, 
confidentiality, or availability of a resource” [3]. Intrusion prevention techniques, 
such as encryption, authentication (e.g., using passwords or biometrics), and de- 
fensive programming, have been used to protect computer systems as the first 
line of defense. However, intrusion prevention alone is not sufficient because as 
systems become ever more complex yet security is still often the after-thought, 
there are always exploitable weakness in the systems due to design and pro- 
gramming errors, or various “socially engineered” penetration techniques. For 
example, after it was first reported many years ago, exploitable “buffer over- 
flow” security holes, which can lead to an unauthorized root shell, still exist in 
some recent system software. Furthermore, as illustrated by recent Distributed 



H. Debar, L. Me, and F. Wu (Eds.): RAID 2000, LNCS 1907, pp. 49—65, 2000. 
(c) Springer- Verlag Berlin Heidelberg 2000 



50 



Wenke Lee et al. 



Denial-of-Service (DDOS) attacks launched against several major Internet sites 
where security measures are in place, the protocols and systems that are de- 
signed to provide services (to the public) are inherently vulnerable to attacks 
such as DOS. Intrusion detection can be used as another wall to protect network 
systems because once an intrusion is detected, e.g., in the early stage of a DOS 
attack, response can be put into place to minimize damages, gather evidence for 
prosecution, and even launch counter attacks. 

Intrusion detection techniques can be categorized into misuse detection and 
anomaly detection. Misuse detection systems, e.g., IDIOT [7] and STAT [4], use 
patterns of well-known attacks or weak spots of the system to match and iden- 
tify known intrusions. For example, a signature rule for the “guessing password 
attack” can be “there are more than 4 failed login attempts within 2 minutes” . 
The main advantage of misuse detection is that it can accurately and efficiently 
detect instances of known attacks. The main disadvantage is that it lacks the 
ability to detect the truly innovative (i.e., newly invented) attacks. Anomaly 
detection systems, e.g., IDES [12], flag observed activities that deviate signif- 
icantly from the established normal usage profiles as anomalies, i.e., possible 
intrusions. For example, the normal profile of a user may contain the averaged 
frequencies of some system commands used in his or her login sessions. If for a 
session that is being monitored, the frequencies are significantly lower or higher, 
then an anomaly alarm will be raised. The main advantage of anomaly detection 
is that it does not require prior knowledge of intrusion and can thus detect new 
intrusions. The main disadvantage is that it may not be able to describe what 
the attack is and may have high false positive rate. 

In 1998, the Defense Advanced Research Project Agency (DARPA) spon- 
sored the first Intrusion Detection Evaluation [1 1] to survey the state-of-the-art 
of research in intrusion detection. The results indicated that the research sys- 
tems were much more effective than the leading commercial systems. However, 
even the best research systems failed to detect a large number of new attacks, 
including those that can lead to unauthorized user or root access. It is very obvi- 
ous that the enemies, knowing that intrusion prevention and detection systems 
are installed in our networks, will attempt to develop and launch new attacks. 
By definition, misuse detection techniques are ineffective against new intrusions. 
While it is critical that we develop effective anomaly detection algorithms to de- 
tect novel attacks, it is also very important that we develop a mechanism where 
once a novel attack is detected (necessarily as an anomaly at first) its behavior 
is analyzed and a specific detection model is built and widely distributed. That 
is, we need to turn a novel attack into a “known” one as quickly as possible 
so that appropriate detection and response mechanisms are in place in a timely 
manner. 

The recent DDOS attacks pose a serious challenge to the current de facto 
practice where an IDS is only concerned with its local network environment, 
without communication with other IDSs in the Internet. As described by Dit- 
trich [2], a DDOS attack is normally accomplished by first breaking into hun- 
dreds (and even thousands) of poorly secured machines around the Internet and 
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installing packet generation “slave” programs on these compromised systems. 
Remote “master” programs (or the attacker) control these slave programs to 
send packets of various types to a target host on the network. Even though each 
slave can just send malicious packets in an amount small enough to be con- 
sidered “acceptable”, the resulting flood, due to a huge number of such slaves 
getting involved simultaneously, can effectively shut the target system out of 
normal operation for periods ranging up to several hours. Although an IDS on 
a target system may detect a DOS, it alone cannot determine that there are 
many compromised systems in the Internet and that they are used to launch the 
attack. In other words, the localized approach can only detect a small part of the 
large-scale distributed attack, and can thus suggest only limited (and often use- 
less) response. On the other hand, if IDSs in the Internet have a communication 
framework where they can exchange attack information, then upon detecting a 
DOS, an IDS can broadcast the attack instance to other IDSs, which can in turn 
activate specific modules (if they are not already running) to look for and kill 
the slave programs (if there are any in their local environments) responsible for 
the DDOS attack. 

Our research aims to develop techniques for detecting novel and distributed 
intrusions. In this paper, we describe an experimental system, based on the 
Common Intrusion Detection Framework (CIDF) [17], where geographically dis- 
tributed IDSs can communicate with each other by following the protocols de- 
fined in CIDF. For example, the IDSs can exchange attack information that 
includes attack source, method, behavior, and response, etc. Moreover, upon 
detecting a novel attack, an IDS can send the relevant audit data to a “model 
builder”, which in turn automatically analyzes the data and computes a new 
detection model specifically for the attack, and distributes the model to other 
IDSs for local customization/ translation and installation. 

The rest of the paper is organized as follows. We first briefly describe the data 
mining technologies that enable the model builder. We then give an overview of 
the specifications of CIDF. We next describe the design and implementation of 
our experimental system. We then describe the experiments of using our system 
to detect (new) DDOS attacks. We compare our research with related work, and 
conclude the paper with a discussion of future research directions. 



2 MADAM ID: A Data Mining Approach for Building 
ID Models 

Currently, building an IDS is a labor-intensive knowledge engineering task where 
“expert knowledge” is codified as the detection models, i.e., the misuse detec- 
tion rules or the measures on system features for normal profiles. Given the 
complexities of today’s network systems, expert knowledge is often incomplete 
and imprecise; as a result, IDSs have limited effectiveness (i.e., accuracy). Fur- 
ther, since the development process is purely manual, updates to IDSs, due to 
new attacks or changed network configurations, are also slow and expensive. 
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We have been researching and developing a more systematic and automated 
approach for building IDSs. We have developed a set of tools that can be applied 
to a variety of audit data sources to generate intrusion detection models. We call 
the collection of these tools MADAM ID (Mining Audit Data for Automated 
Models for Intrusion Detection) [8,10]. The central theme of our approach is to 
apply data mining programs to the extensively gathered audit data to compute 
models that accurately capture the actual behavior (i.e., patterns) of intrusions 
and normal activities. This approach significantly reduces the need to manually 
analyze and encode intrusion patterns, as well as the guesswork in selecting sta- 
tistical measures for normal usage profiles. The resultant models can be more 
effective because they are computed and validated using large amount of audit 
data. Results from the 1998 DARPA Intrusion Detection Evaluation [11] showed 
that the detection models produced by MADAM ID had one of the best perfor- 
mances (i.e., with the highest true positive rates while keeping the false alarm 
rates within the “tolerable” ranges) among the participating systems, most of 
which were knowledge engineered. 

The main elements of MADAM ID include the programs for computing ac- 
tivity patterns from audit data, constructing features from the patterns, and 
learning classifiers for intrusion detection from audit records processed accord- 
ing to the feature definitions. The process of using MADAM ID is shown in 
Figure 1. 



models 




Fig. 1. The data mining process of building ID models 



The end product of MADAM ID is a set of classification rules that can be used 
as intrusion detection models. We consider intrusion detection as a classification 
problem because ideally we want to classify each audit record into one of a 
discrete set of possible categories, i.e., normal, a particular kind of intrusion, or 
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anomaly. Given a set of records, where one of the features is the class label (i.e., 
the concept), classification algorithms can compute a model that uses the most 
discriminating feature values to describe each concept. An example rule output 
from RIPPER [1], a classification rule learner, is the following: 

pod wrong_f ragment >= 1, protocol_type = icmp. 

This “ping of death” rule uses a conjunction of two conditional tests, that 
each checks the value of a feature, e.g., “wrongjragment” . Before we can apply 
classification algorithms, we need to first select and construct the right set of 
system features that may contain evidence (indicators) of normal or intrusions. In 
fact, feature selection/construction is the most challenging problem in building 
IDSs, regardless the development approach in use. 

We exploit the temporal and statistical nature of network traffic and apply 
data mining programs to compute patterns for feature constructions. As shown 
in Figure 1, raw audit data, e.g., tcpdump [5] data of network traffic, is first 
processed into packet/event level ASCII data, which is further summarized into 
connection/session level records. Each record is defined by a set of basic and 
general-purpose features, e.g., start time, duration, source and destination hosts 
and ports, number of bytes transferred, and a flag that indicates the protocol- 
level behavior of the connection (e.g., SF for normal SYN and FIN), etc. Data 
mining algorithms that are optimized for audit data are then applied to compute 
various activity patterns from the audit records, in the forms of per-host and 
per-service frequent sequential patterns [9]. Given a set of extensively gathered 
normal audit data and another audit data set that includes an intrusion instance, 
we can compare the patterns from the normal data and the “intrusion” data to 
identify the “intrusion-only” patterns, i.e., those that exhibit only in the “in- 
trusion” data. These patterns are then parsed to construct appropriate features 
that are predictive to the intrusion. For example, a pattern of “SYN flood” is 
shown in Table 1. Accordingly, the following features are constructed for “SYN 
flood” : a count of the connections to the same destination host in the past 2 sec- 
onds, and among these connections, the percentage that are to the same service, 
and the percentage that have the SO flag. In prior work [8], we showed that these 
constructed features have high information gain, and can therefore improve the 
accuracy of the classification rules. 

The process of applying MADAM ID to build intrusion models, as shown in 
Figure 1, involves multiple steps and iterations. For example, poor performance 
of a model often suggests that additional features need to be constructed, and/or 
additional kinds of patterns need to be computed, etc. We have developed a 
process-centered approach to automate this process: the completion of one step 
automatically triggers the next step in the process; and heuristics are used to 
automatically tune the programs of each step, via parameter selection, to achieve 
performance improvement over the previous iteration. The resulting “process- 
centered” system can service real-time “model building” request by accepting 
audit data from an IDS and iterating through the process to compute a desired 
detection model. 
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Table 1. An example of intrusion patterns 



Frequent episode Meaning 

(flag=S0, service=http, dst_host=victim), 93% of the time, after two http con- 
(flag=S0, service=http, dst_host=victim) nections with S'O flag (i.e., only one 
^ (flag=S0, service=http, dst_host=victim) SYN packet is sent) are made to 
[0.93, 0.03, 2] host victim, within 2 seconds from 

the first of these two, the third sim- 
ilar connection is made, and this 
pattern occurs in 3% of the data 



3 An Overview of CIDF 

In 1997, a group of research projects funded by DARPA began a collaborative 
effort called the Common Intrusion Detection Framework (CIDF). The motiva- 
tion of CIDF was to provide an infrastructure that allows intrusion detection, 
analysis, and response (IDAR) systems and components to share information 
about distributed and coordinated attacks. 

A major design goal of CIDF is that IDAR systems can be treated as “black 
boxes” that produce and consume intrusion-related information. According to 
the roles the IDAR components play in CIDF, they can be categorized as the 
event generators (E-boxes), analysis engine (A-boxes), response engines (R- 
boxes), and databases (D-boxes). All four kinds of CIDF components exchange 
data in the form of Generalized Intrusion Detection Objects (GIDOs), which are 
represented via a standard common format, defined using the Common Intrusion 
Specification Language (CISL) [18]. A GIDO encodes the fact that some par- 
ticular events happened at some particular time, or some analytical conclusion 
about a set of events, or an instruction to carry out an action. 

Given that there is a wide variety of intrusion-related information, CISL 
needs to be flexible and extensible. The main language construct of CISL is the 
general-purpose S-expression [16]. S-expressions are simply recursive groupings 
of tags and data. An example S-expression is: 

(FileName ’/etc/passwd’) 

This S-expression simply groups two terms FileName and ’/etc/passwd’ to- 
gether. The advantage of S-expressions is that they provide an explicit asso- 
ciation between terms, without limiting what those terms and their groupings 
might express. In CISL, intrusion-related data is expressed as a sequence of S- 
expressions with two or more elements. The first element always indicates how 
to interpret the data that follows, i.e., it is a tag that provides semantic “clue” 
to the interpretation of the rest of the S-expression. For this reason, these tags 
are called Semantic IDentifiers, or SIDs for short. As an example, the report of 
an event “user Joe deleted /etc/passwd” can be expressed as: 
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(Delete 

(Initiator 

(UserName ’Joe’) 

) 

(FileSource 

(FileName ’ /etc/passwd’ ) 

) 

) 

A set of CIDF APIs is provided for encoding and decoding GIDOs. Encoding 
a GIDO involves first translating the S-expression into a corresponding tree-like 
structure, then encode the structure into a sequence of bytes. Decoding the byte 
sequence back into a tree structure simply reverses the above procedure. Each 
SID code indicates, in a bit of the first byte, the type of argument that the SID 
takes: an elementary data, an array, or a sequence of S-expressions. The parser 
then interprets the succeeding bytes accordingly. The tree can be printed in 
S-expression format for further processing, i.e., extracting the intrusion-related 
data, by the CIDF component. 

CIDF also provides a matchmaking service, i.e., a matchmaker, through 
which CIDF components can make themselves known to other components, and 
to locate communication “partners” with which they can share information, and 
request or provide services. The matchmaker supports feature-based lookup by 
grouping the CIDF components based on their capabilities. Communications in 
CIDF need to be as secured as possible because the intrusion-related data being 
transmitted is obviously critical to the well beings of the IDAR systems. The 
matchmaker thus provides authenticated and secured communications between 
CIDF components by acting also as a Certificate Authority (CA). The CIDF 
messages, i.e., GIDO packets, can include authentication headers and can be 
encrypted. 

4 MADAM ID as a Modeling Engine in CIDF 

Researchers have laid a lot of groundwork in defining CISL, GIDO encoding and 
decoding APIs, and the communication protocols, including matchmaking and 
authentication, between CIDF components. Research projects or experiments re- 
lated to CIDF tend to focus on how two IDSs communicate, e.g., by exchanging 
attack and response data. In addition to studying how IDSs can cooperatively 
detect distributed attacks in real-time, we are interested in using CIDF to facil- 
itate the distribution of new intrusion detection models so that “novel” attacks 
will have a very short “life span”. That is, we study how a model builder, e.g., 
MADAM ID, can receive attack data, then rapidly and automatically produce 
appropriate models, and distribute them to IDSs for installations. 

4.1 Design Considerations 

When an IDS detects an attack, it can broadcast the instance report to other 
IDSs, which in turn can check whether the same attack is launched against their 
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local environments or that some local suspicious activities may have caused the 
attack to other environment(s). Such cooperation among IDSs is well supported 
by CIDF. The challenge is to build up and maintain a set of SIDs and a dictionary 
on their possible values that can be used to accurately describe attack scenarios. 

Introducing modeling service into CIDF is not as straightforward as it may 
seem. Note that an “analysis engine” of CIDF has limited capabilities in that it 
draws a conclusion from event data (i.e., whether and what kind of intrusion has 
occurred) and can even suggest a response, but it does not provide a detection 
method. When modeling service is available, upon detecting a new intrusion 
(as an anomaly), an IDS encodes the relevant audit data (e.g., network traffic 
within a time window) into GIDOs and transmits the GIDOs to the “modeling 
engine”, i.e., the process-centered MADAM ID system. MADAM ID then per- 
forms pattern mining, feature construction, and rule learning using the audit 
data extracted from the GIDOs. MADAM ID keeps a large store of “baseline” 
normal patterns so that “intrusion patterns” can be easily identified for the re- 
ported attack/ anomaly. It also keeps a large amount of “historical” attack data 
and patterns so that if the reported attack is old attack or a slight variant, an 
“updated” rule can be produced by training on the combination of “historical” 
and “new” data. We see here that a modeling engine includes the functionalities 
of an analysis engine and database. 



attack data 



IDS 



Mode'*”" u'r.rT?i-..Q 




attack data 



IDS 



Match Maker / CA 



Fig. 2. A GIDF architecture 



The most challenging issue in adding a modeling engine into GIDF is the 
encoding of features and rules into GIDOs. A set of SIDs need to be defined 
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to express the computational instructions for each feature and rule so that an 
IDS receiving a “ID model GIDO” can automatically parse the instructions and 
generate local execution modules. Each rule can also include the “accuracy” (i.e. 
confidence) measurement so that an IDS can decide whether to accept or reject 
the rule. An updated rule will also have the special tag “updated” so that old rule 
can be replaced. Figure 2 shows the architecture of CIDF where IDSs can share 
attack information with each other, and send attack data to the modeling engine, 
which in turn computes and distributes new detection models. A matchmaker 
is responsible for hooking up the IDSs with each other and with the modeling 
engine, and for facilitating authenticated and secured communications between 
CIDF components. 

4.2 Implementation of an Experimental System 

We implemented an experimental system, based on CIDF, where MADAM ID 
is the modeling engine, and Bro [14] and NFR [13] are the two real-time IDSs. 
We also implemented a system that acts as both a matchmaker and a CA. We 
described our experiences here. 



The Modeling Engine As shown in Figure 1, MADAM ID normally starts the 
model building process from raw audit data, but can also directly use the pro- 
cessed connection/session records. In either case, an IDS needs to supply audit 
data to MADAM ID. We used a new SID AuditData that at the encoding end 
specifies a (local) file that contains the audit data, and another tag AuditData- 
Size that specifies the size of the audit data. When encoding an S-expression 
that contains these two tags, the audit data size is recorded in the GIDO ob- 
ject, followed by the content of the audit data file, so that the receiving end has 
sufficient information to accurately extract the audit data. 

To encode features and rules in GIDOs, we first enumerated a set of “essen- 
tial” (i.e., boot trap) features that all IDSs must know how to compute. These 
features include: source host, source port, destination host, protocol type, ser- 
vice, duration, flag, etc. We assigned each feature a unique id, and introduced a 
number of new SIDs for specifying features and their values. For example, the 
feature condition “flag has value SO” is represented as the following S-expression: 

(FeatureCond 

(FeaturelD ’flag’) 

(Cond 

(CondOperator ’equal’) 

(CondValue ’SO’) 

) 

) 



MADAM ID constructs new features as functions (i.e., some operations or 
computations) of some (existing) feature conditions for the connections that 
satisfy certain constraints. The feature construction operators include count. 
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percent, average, etc. The data constraints include same (e.g., same destination 
host or same-service), different, time window (e.g., 2 seconds), etc. We used a 
number of new SIDs to define new features. As an example, one of the “SYN 
flood” features, “for the connections to the same destination host in the past 
2 seconds, the percentage that have the SO flag”, is expressed as the following 
S-expression: 



(FeatureDef 

(FeaturelD ’S0_rate’) 

(Constraint 

(ConstraintType ’same’) 
(ConstraintValue ’destination host’) 



) 

(Constraint 

(ConstraintType ’time’) 
(ConstraintValue ’2 seconds’) 



) 



(Operation 

(Operator ’percent’) 
(FeatureCond 

(FeaturelD ’flag’) 

(Cond 

(CondOperator ’equal’) 
(CondValue ’SO’) 



) 



) 



) 



An intrusion detection rule, which is simply a sequence of conjuncts on feature 
conditions, can then be specified using the following form: 



(Detect ionRule 
(AttackType . . . ) 

(FeatureDef ...) [optional] 
(FeatureCond ...) 



That is, the expression first specifies the type of attack (i.e., an intrusion 
name) that can be detected by this rule. It then describes the definitions of any 
new features used in the rule, followed by a sequence of feature conditions. 

We can see that as long as the IDSs and the modeling engine understand the 
same vocabulary, i.e., the set of SIDs and the specific values (i.e., the features, 
operators, constraints, etc.), new intrusion detection models can be expressed 
and parsed unambiguously. When an IDS rejoins CIDF after a period of absent, 
its vocabulary needs to be updated, by exchanging GIDOs with the modeling 
engine. 
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In our experimental system, MADAM ID listens to a socket for incoming 
audit data, and sends new intrusion detection models to all IDSs. A “wrapper” , 
which consists of function calls to CIDF APIs, along with data dictionaries of 
the features, operators, constraints, etc., is added to MADAM ID so that audit 
data can be extracted and detection models can be encoded. 




Fig. 3. The CIDF interfaces of IDS 



The IDSs We used Bro and NFR in our experimental system for two reasons. 
First, they are “programmable” IDSs because both filter network traffic streams 
into a series of events, and execute scripts, e.g., Bro policy scripts or NFR N- 
codes, which contain site-specific event handlers, i.e., intrusion detection and 
handling rules. Since the event handling scripts are interpreted by the IDSs, 
adding new detection rules does not require re-building the IDSs. Such feature 
facilitates the fast updates of IDSs when new detections models are distributed. 
The second consideration is a practical one: we have been using these two real- 
time IDSs for various experiments in the past couple of years, and have the 
source codes of both systems. 

As illustrated in Figure 3, for each IDS, we implemented a CIDF daemon, 
which is responsible for receiving and decoding GIDOs, and a CIDF client, which 
is responsible for encoding and sending GIDOs. Upon execution, the CIDF Dae- 
mon creates a Shared Message Queue that allows it to communicate with the 
IDS whenever new messages arrive. This queue will be constantly monitored by 
the IDS (more details to follow). The CIDF Daemon will be listening on a port, 
e.g., 3295 (OxOCDF), for incoming GIDOs. Upon connection from a client (i.e.. 
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a CIDF Client), the daemon verifies that the received message is a valid GIDO, 
decodes it to S-expressions. The decoded data is then queued into the Shared 
Message Queue to signal the IDS for further processing. 

The packet filtering engine of the each IDS was modified in order to sup- 
port communication with the CIDF Daemon. Since “connection-finished” is a 
condition checked by a packet filtering engine, the probing of the Shared Mes- 
sage Queue is scheduled to occur whenever the engine processes a connection- 
finished event. Upon receiving the message (i.e., the S-expressions), the packet 
engine queues an event along with the S-expressions into the Event Queue that 
it shares with the script interpreter. CIDF-related events e.g., interpreting the 
S-expressions to take appropriate actions, are handled by the cidLprocess script 
function. A “model interpreter” is implemented as a separate process, invoked by 
the cidLprocess function, to parse the S-expressions that describe a new intru- 
sion detection rule. The generated local intrusion handlers can then be inspected 
by human experts and loaded into the IDS. Note that the reason that incoming 
data, from other CIDF components, have to come through the packet engine to 
reach event handlers is that not all IDSs supports the functionality by which the 
interpreted script can load data from local file system into the program space 
(of an IDS). 

The CIDF Client provides the means for an IDS to encode and send GIDOs 
to other CIDF components. Whenever there is information (e.g., audit data, 
intrusion alerts, etc) to be sent, the request is serviced by the “cidf_send” script 
function. cidf_send firsts construct S-expressions and writes the data to a local 
file. It then invokes the CIDF Client process and passes the filename to it. The 
CIDF Client parses the S-expressions in the file and encodes them into a GIDO. 
The GIDO is in turn sent to the modeling engine or other IDSs. An example 
S-expression on a “SYN flood” attack instance is the following: 

(Attack 

(AttackSpecif ics 

(AttackID 0x00000004 0x00000000) 

) 

(When 

(Time 953076974.391314) 

) 

(Message 

(Comment ’SYN flood’) 

(SourceIPV4Address 181 . 162 . 15 . 74) 

(TCPSourcePort 1551/tcp) 

(DestinationIPV4Address 152 . 1 . 205 . 85) 
(TCPDestinationPort 80/tcp) 

(TCPConnectionStatus 1) 

) 

) 
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The Matchmaker and CA The CIDF matchmaker is used by the components 
to locate “partners” with whom they can communicate with. For our implemen- 
tation, the matchmaker is integrated with the CA server, i.e. they are the same 
process. The matchmaker maintains a list of all CIDF components and their 
roles (i.e. modeling engine, event engine, etc). New components and roles can be 
added as desired. When a component needs to communicate with another com- 
ponent, it can do so in either of two ways. If it knows the IP address or hostname 
of the other component, then it can contact that component directly for commu- 
nication. In this case, our matchmaker does not play any active role. However, if 
the component only knows that it wants to communicate with a CIDF compo- 
nent with certain capability, it sends this request to the matchmaker along with 
any other criteria. The matchmaker then looks up the list of IDSs that satisfy 
the criteria specified and returns the “matched” list to the component. If no 
suitable match is found, then a “matchmaking failure” message is returned. 

The CA provides authentication of components and enable their secured com- 
munications using RSA public key encryption technology. When components A 
and B need to establish a secured communication, they need to first authenticate 
each other: A first uses its private RSA key to sign a random nonce generated 
and given to it by B; it then returns this singed nonce along with its own random 
nonce to B, who verifies the signature on it using A’s public RSA key, signs the 
A’s nonce using its own private RSA key and sends the signed nonce back to A; 
A then verifies the signature using B’s public RSA key. Once the components 
authenticate each other, they can use their RSA keys to establish a per-session 
secret key. DES (Data Encryption Standard) secret key algorithm is then used to 
encrypt the data transmission between the components. Before encrypting any 
data to be sent to its peer, a component compresses it using the lossless data 
compression algorithm. Similarly the peer must uncompress the data after de- 
crypting it. Compression not only reduces the size of the data being transmitted, 
but also provides an added layer of data confidentiality. 

Certificates signed by CA are used by the components to exchange their pub- 
lic RSA keys in the first place. A certificate contains the following fields: the ID 
of the CIDF component (i.e. the IP of the host it is running on); the compo- 
nent’s public RSA key; the current time stored as a timestamp. A certificate is 
considered as valid (i.e. not yet expired) if its timestamp is within the past 90 
minutes. We assume that all components know the public RSA key of the CA 
in advance. This is needed to authenticate the CA and also verify the signature 
on certificates signed by the CA. This public RSA key must be distributed to 
the components by some other means such as manual entry. 

5 Experiments 

We deployed our experimental system in our campus network where Bro, NFR, 
and MADAM ID are in separate subnets. We conducted a series of experiments 
to test the limits and strength of this system. We describe some of the findings 
here. 
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Our first set of experiments was designed to test the interoperability of the 
CIDF components. The results were then used to fine-tune the implementation 
of our system. For example, we simulated “SYN flood” attack against the subnet 
that Bro was monitoring. Bro detected the attack and sent a GIDO describing the 
instance to NFR. We compared the S-expressions on both sending (i.e. Bro) and 
receiving (i.e. NFR) ends to verify that they matched. In another experiment, 
we took out the “SYN flood” from Bro so that it could not detect the attack as 
a specific intrusion, but rather an anomaly based on the unusual traffic statistics 
caused by the attack. Bro then sent then tcpdump data to MADAM ID for a 
new intrusion detection model. We verified that a new rule was computed and 
distributed to both Bro and NFR. Both IDSs were able to translate the rule into 
their local script functions. 

Our second set of experiments was designed to test the limit of the model- 
ing engine. In a set of “timing” experiments, we measured how long it took for 
MADAM ID to compute a detection model after it received the audit data. We 
found that the results had a wide range, from mere seconds to a few hours. Upon 
detailed analysis, we discovered that the automated iterative process of mining 
data, constructing features, computing classifiers, and evaluating performances 
requires more guidance in its heuristic settings of parameters for the programs. 
Otherwise, the exhaustive search can be very slow. For example, detecting “SYN 
flood” may require patterns of “same destination host and same service” be com- 
puted and compared (with normal patterns), while “Port-scan” requires “same 
destination host and different service” patterns. We are studying whether it is 
feasible for an IDS to include a “high-level” description of the attack behavior 
in the GIDO so that the modeling engine can have a better chance of setting the 
correct parameters for the data mining algorithms. In a set of “modeling” exper- 
iments, we tried to discover whether some new models could not be expressed as 
S-expressions because of our current assumptions that every IDS “understands” 
the same boot-trap set of features and that new features can be expressed us- 
ing the existing features. We indeed found that for intrusions that require new 
pre-processing from raw audit data (hence new boot-trap features), it is virtu- 
ally impossible to express the complex and delicate data pre-processing logic 
in S-expressions. For example, “teardrop” attacks require special processing of 
IP fragments, which can only be expressed in high-level programming language 
such as C/C++. In fact, human experts must get involve in the modeling process 
when such new data pre-processing is required (since without these new features 
defined by human experts, the automatic data mining process will not be able 
to produce a good model and will terminate when it uses up the allotted GPU 
time in the model building process) . We are studying how to distribute new data 
pre-processing codes in GIDF. 

Our third set of experiments was intended to verify that the IDSs could work 
together to detect distributed attack. We used the Tribal Flood Network (TFN) 
DDOS attack tool for our experiment. Bro monitored the subnet of the target 
host and detected the attack, which was launched by several slave programs, 
each running in a separate host, in the subnet monitored by NFR. When NFR 
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received the attack instance report from Bro, it parsed the GIDO and recognized 
that a SYN flood seemed to have originated from its subnet (by checking the 
SourcelPVd Address SID). It then activated a N-code that detects and blocks 
the “control message” (ICMP echo reply packets) sent by the attack “masters” 
to the “slaves” . It also launched a search for the slave programs, by informing 
a special daemon on each system in its subnet to look for and kill any running 
process with a name that matches any in the list of known attack programs. Our 
results showed that within a few seconds after Bro detected the attack, NFR was 
able to kill all attacking programs. 

6 Related Work 

EMERALD [15] provides architecture to facilitate enterprise- wide deployment 
and conflgiiration of intrusion detectors. A “resolver” is used to combine the 
alarms from the distributed detectors to make a determination of the state of 
the (entire) network. This is certainly a right direction for detecting coordinated 
attack against the enterprise. The resolver technology can also be utilized for 
event analysis in a CIDF environment. This scope of the system is limited to an 
enterprise. We are more interested in the problem of how IDSs can collaborate 
over the Internet, and more importantly, how to automatically produce and 
distribute new intrusion detection models for “novel” attacks. 

Kephart et al. [6] outlined a system architecture where anti-virus systems 
across the Internet can subscribe to a centralized virus modeling server to receive 
fast updates whenever a new virus is discovered and a new anti-virus module 
is produced. This is very similar to our idea of adding the modeling engine 
to CIDF. Our system has the additional capability of facilitating the IDSs to 
exchange attack information to detect distributed intrusions. 

7 Conclusion and Future Work 

In this paper, we discussed the need for new techniques to detect novel intrusions 
as well as distributed attacks. We proposed to add a modeling service to CIDF 
so that IDSs can not only exchange attack data to detect distributed intrusions, 
but also receive detection models once a new attack method surfaces. We de- 
scribed the underlining technologies of our approach, namely, MADAM ID, a 
data mining framework for automatically building intrusion detection models, 
and CIDF, a framework for IDAR components to collaborate. We discussed the 
design and implementation of an experimental system, which uses MADAM ID 
as the modeling engine, and Bro and NFR as the real-time IDSs. Although our 
experiments are still preliminary, the promising results showed that these com- 
ponents can interoperate to detect distributed attacks, and can produce and 
distribute new intrusion detection models. 

As for future work, we plan to first conduct more extensive and robust exper- 
iments. We will install the components of our experimental system to separate 
domains over the Internet for a new set of “timing” experiments. We will also 
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run an extensive set of attacks, for example, those that are generated by DARPA 
“red team”, to test whether our system can indeed achieve a better detection 
performance over a single system. 

We will continue to develop the underlying technologies of our system. In 
particular, we will investigate how to improve the automated process of build- 
ing intrusion models, and how to encode and distribute features and rules that 
require detailed system knowledge and are beyond the scope of current CISL. 
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Abstract. The use of program execution traces to detect intrusions has 
proven to be a successful strategy. Existing systems that employ this 
approach are anomaly detectors, meaning that they model a program’s 
normal behavior and signal deviations from that behavior. Unfortunately, 
many program-based exploits of NT systems use specialized malicious 
executables. Anomaly detection systems cannot deal with such programs 
because there is no standard of “normalcy” that they deviate from. 

This paper is a preliminary report on an attempt to remedy that situ- 
ation. We report on a prototype system that learns to identify specific 
program behaviors. Though the goal is to identify malicious behavior, in 
this paper we report on experiments seeking to identify the behavior of 
the web-browser, since we did not have enough exemplars of malicious 
behavior to use as training data. 

Using automatically generated finite automata, we search for features in 
execution traces that allow us to distinguish browsers from other pro- 
grams. In our experiments, we find that this technique does, in fact, allow 
us to distinguish traces Internet Explorer from traces of programs that 
are not web browsers, after training with Netscape and a different set of 
non-browsers. 

Keywords: machine learning, finite automata, feature detection, data 
mining 



1 Introduction 

Many kinds of malicious activity in information systems can be detected by 
monitoring execution traces. Broadly speaking, these are just compact synopses 
of what a program does as it executes. The idea of using execution traces for 
intrusion detection was pioneered by [2] , where the execution traces record what 
system calls a program makes, and the intrusion detector tries to decide whether 
a given execution trace reflects normal behavior for that program. 

The idea of looking for features that identify malicious execution traces brings 
to mind the idea of signature detection. Many signature detection systems [5,7] 
do exactly that: look for features that might be used to identify malicious pro- 
grams. Unfortunately, the signatures in question are usually created by hand, 
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and this is time-consuming. It is also hard to determine how well a signature- 
based system generalizes. Finally, existing signature detection systems do not 
use execution traces, and we would like to investigate the possibility of doing so, 
due to the success of execution-trace-based systems in detecting other intrusions 
that employ executable programs. 

It would therefore be appealing to acquire signatures automatically — based 
on execution traces — with machine learning algorithms. Not only does this 
lead to an automated process, but the generalization ability of machine-learning 
algorithms is much better understood than the generalization ability of human- 
generated rules (see [1,10]). This is because the machine learning algorithm can 
be scrutinized while the human’s thought-processes cannot. 

This paper presents the basis of a technique for identifying malicious exe- 
cution traces with automatically-learned finite automata. What we address is 
the process of finding features that distinguish malicious execution traces from 
benign ones. In other words, we discuss a data-mining technique for program 
execution traces. 

We use training data that contains exemplars of malicious execution traces 
as well as benign ones (how we obtain these traces is described in Section 2.1). 
All traces are thrown together and used to construct a finite state machine using 
a process we describe in Section 2.2. Once the FSM has been built, we identify 
the transitions that appear only in the malicious traces (Section 3). When a 
novel execution trace exercises such a transition, this is taken as evidence that 
the new trace is also malicious. 

In Section 4, we present the results of an experiment that suggests this tech- 
nique can, in fact lead to generalization; we were able to identify specific behavior 
in programs whose execution traces were not used during training. 

Unfortunately, our corpus of execution traces from real malicious executables 
was not diverse enough for machine learning at the time of these experiments, 
so we used an artificial definition of “malicious” behavior; we attempted to iden- 
tify Web browsers based on their execution traces. We trained our FSMs using 
Netscape as the “malicious” program, and attempted to identify not only novel 
execution traces from Netscape, but also traces from Internet Explorer, which 
was not used for training. We also used traces from a number of haphazardly 
selected programs as exemplars of “benign,” e.g., non-browsing behavior. In this 
particular case, we found that we could, indeed, find features that distinguish 
execution traces of Internet Explorer from traces of other programs not used 
during training. 

The process we used has human intervention: the choice of benign programs 
used for training was modified twice before we achieved the desired results. In 
a full-fledged machine learning system, this would be done automatically, and 
the system we will describe here would be regarded as a mechanism for feature 
selection. At the end of the paper, we briefly discuss some ways in which the 
entire process might be automated. 
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2 Constructing Finite Automata from Audit Information 

Our approach to learning intrusion signatures has two steps: first, we distill 
the audit information down to a series of symbolic audit events, and then we 
use these audit traces to construct a finite automaton with certain transitions, 
labelled as “bad,” that are believed to be exercised only by malicious programs. 

The next two subsections outline our construction of the audit traces and 
the construction of finite automata from the audit traces. 

2.1 Preprocessing of Audit Data 

Currently, our behavior data is obtained from NT security logs, making this an 
off-line prototype. The NT auditing system captures various actions performed 
by executing programs, such the invocation of a new program by an existing 
program, the termination of a program, and access to resources. For each pro- 
gram whose invocation is recorded in the audit log, we distill an execution trace 
by recording these basic events. That is, each event is associated with a unique 
number, and that number is recorded in the execution trace whenever the event 
occurs. 

When a resource is accessed, an annotation in the audit log describes the 
way in which it was accessed. For example, a log entry might record that a file 
was accessed for reading and writing. We treat each such access as an execution 
event. For example, an audit log entry describing a file access for read and write 
would results in two entries, a read and a write, being recorded in our execution 
trace, associating it with a number in the execution trace. 

This means that the execution trace may not faithfully record the order of 
the operations performed on an object in a single access. For example, if a single 
object access is an access for reading and an access for writing, then the audit 
log does not record whether the read ultimately takes place before the write or 
vice versa. Therefore, our execution traces also have the read and the write in a 
standard, canonical order (the read first and the write second) regardless of the 
order in which they actually took place. 

However, the order of accesses may not be critical, since the execution traces 
are used to identify programs, not to determine their exact semantics. Indeed, the 
results of our experiments suggest that the execution traces do contain enough 
information, at least for this particular application. 

2.2 State-Merging Algorithms for Learning FSMs 

State merging algorithms start with a prefix tree (also called an acceptor tree) 
describing the training data. The edges of an acceptor tree are labeled with the 
events that can occur in the training data; in this case it is the set of events that 
might be extracted from an NT security audit log. Each sequence of events in the 
training data corresponds to the edge labels on a path through the acceptor tree, 
starting at the root, and, if there is a unique end symbol, ending in a leaf. (This 
construct is a prefix tree because each prefix of each sequence is also represented 
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by a path starting starting at the root.) Figure lA shows an acceptor tree for 
the sequences baadb, baadc. babb, and babe. 



Fig. 1. Illustration of the state-merging approach to learning finite automata. (A) 
shows the tree constructed from the four strings baadb, baadc, babb, and babe. Note 
that each path from the root to a lead has labels corresponding to one of the strings. 
(B) shows the result of merging states 4 and 5 in the previous figure. These states were 
merged because the subtrees rooted at states 4 and 5 in (A) are the same. The decision 
of what states to merge can be made in other ways as well 



Learning takes place by selectively merging states in the acceptor tree; this is 
illustrated in Figure IB. State merging algorithms differ in their choices of which 
states will be merged (and this can include a decision not to merge certain states 
with any others). 

At one extreme, one can simply traverse the tree and treat the nodes one at 
a time, deciding whether to merge the node, and what node to merge it with. 
At the other extreme, one could devise an optimality condition for the entire 
tree and then treat the whole learning problem as a global optimization problem 
(with any merger or separation of nodes being permissible at any time) . 

Blue-fringe algorithms (see [4]), are a class of algorithms that are at a mid- 
point between these extremes. A blue-fringe algorithm partitions the set of nodes 
into three sets: the red nodes, which can no longer be merged with one another; 
the blue nodes, which may only be merged with red nodes, and the white nodes, 
which have never been merged and are not yet candidates for merging. Initially, 
the root node is red and its children are blue, and a blue-fringe algorithm pre- 
serves the following invariants: 

1. The red nodes form an arbitrary graph but cannot be merged with one 
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another; 
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2. Each child of a red node is either red or blue; 

3. Each blue node is the root of a tree. 

Merging a red and blue node results in a red node, meaning that the children 
of the blue node must, themselves, be promoted to blue, in order to preserve 
the invariants. The algorithm may also decide to promote a blue node to red 
without a merge, meaning that there is no red node suitable for merging; here, 
too, white nodes must be promoted. Finally, the merger of a red and a blue 
node may cause the automaton to become nondeterministic, and this problem 
is solved by recursively merging the appropriate children. This can also lead to 
promotions. 

The decision of whether or not to merge two states is often based on the 
structure of their offspring. For example, the traditional algorithm of [8] merges 
two nodes if the children of the blue node, which form a tree due to the invariants, 
can be superimposed on the offspring of the red node without any mismatches. 

In our application, the acceptor tree is much more sparse than in many other 
FSM-learning problems. Each node could have as many children as there are 
possible audit events (106 in our experiments), but in general, the number of 
children is far smaller. This makes it hard to use the structure of the offspring as 
a guide to merging nodes; the graph or tree formed by these offspring is generally 
incomplete (as compared to a hypothetical acceptor tree where every possible 
execution path was represented). 

Therefore, our algorithm only checks the subtrees to a certain depth. This 
idea was used in [3] to improve performance, but here, we use it because it 
reduces the amount of information used in deciding when to merge, and thus 
reduces the likelihood of a bad merged caused by missing information. (Adopting 
the terminology of [3], we refer to the bounded-depth subtree of a node as the 
signature of that node.) 

The signature is thus simply a smaller acceptor tree, and each string accepted 
by the signature tree of node k consists of the ith through I -|- Dth symbols 
of some execution trace in the training data, where (. is the depth of node fc, 
and D is the signature depth. For example, the depth-2 signature of node 2 in 
Figure lA accepts the strings ad, bb, and be. We will call these strings — the 
ones accepted by the signature tree of node k — the signature strings of node k. 

We use the signature strings to implement the following heuristic: we compare 
the estimated probability densities of the signature strings of two nodes, in order 
to guess whether the two nodes should be merged. (That is, we merge the nodes 
if the probability densities appear to be similar). To understand this approach, 
notice that some nodes in an acceptor tree, such as node two in Figure lA, are 
reached by more than one training example. In fact, all four training examples 
reach node two in Figure lA, since all four start with the symbols b a. In 
Figure IB, after a merge has taken place, all the examples also reach node 
4, since all examples start with either baadorbabc. The probability 
of a signature string s, given that we reach node k, can thus be defined as the 
probability seeing s after reaching node k. Formally, it is the probability of seeing 
an execution trace whose ith through i + Dth symbols the string s, given that 
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the 1st through ^ — 1st symbols in the trace correspond to the edge-labels on 
the path starting at the root of the tree and ending at node k. i is the depth 
of node k, as above. (Note that the heuristic is implicitly based on assumption 
that the nodes act like states in a Markov process, which need not be true of all 
programs.) 

Let denote the number of times we see the signature string s after we 

reach state k, and let Uk denote the number of training examples that reach 
state k in the first place. Then, the probability estimate for the string s, given 
that we are at node k, is just r'fe/rifc. We will use p{s\k) to denote this estimate. 

To score the quality of a merge between two states j and fc, we use the 
metric, which is defined as 



where m is the total number of possible signature strings, and p{si\k) is taken as 
zero if the string Si never reaches state k. Here, d may be any positive number 
and its actual value is a parameter of the training algorithm. 

In summary, when we consider merging two nodes j and k, we judge the 
quality of the merge by looking at Ld{j, k) for some d; a smaller value indicates 
a better merge because it indicates that the (estimated) probabilities of all the 
strings are closer together. If there are several pairs of nodes that we are thinking 
about merging, we will only merge the one pair that has the lowest Ld score. 

It may be that all the merges we are considering are so poor that no merge 
should take place at all. This happens when each merge has an Ld measure 
greater than some threshold set by the user. When this happens, we perform a 
promotion (as described above) instead of a merge. 

3 Data-Mining Using Learned FSMs 

Our goal is to discover features that can be used to distinguish malicious exe- 
cutables from non-malicious ones. Therefore, our approach uses labeled training 
data; a given sequence of audit events can be labelled good, meaning that the 
executable is not malicious, or bad, meaning that the executable is malicious. 
The approach works as follows: 

1. Build a FSM from the combined good and bad execution traces. 

2. If any transition in the FSM is only exercised by bad traces, then label that 
transition as bad. All other transitions are labelled as good. 

During testing, we follow the trace of a new program through the FSM to deter- 
mine if the executable should be considered malicious. We can either keep score 
of the number of bad transitions that the trace exercises, or else we can raise an 
alarm right away when a bad transition is exercised. 

Note that, in our approach, a transition is good if it is exercised by both good 
and bad traces from the training data. This isolates that transitions that are only 
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exercised by malicious executables; by looking for behavior that is exclusive to 
malicious executables, we try to find the features that can be used to identify 
them. We could also have taken the opposite approach, calling any transition 
bad in case of doubt and thus isolating the features that can be used to identify 
non-malicious executables. The advantage of our approach is that it still works if 
some non-malicious traces are accidentally included with the malicious ones. This 
is meant to simplify the collection of training data; if we know that an intrusion 
occurred near a certain time, but we are unsure about which executables were 
involved in carrying out the attack, we can simply throw all execution traces 
occurring around that time into the bad category. 

The overall state-merging algorithm we used was described in Section 2. 
Some specific details are that we used signatures of depth 2 when comparing 
to nodes to see if they could be merged, and that we used the Li distance for 
comparison (that is, we used the Ld distance described in Section 2, with d= 1). 
The red-blue pair with the lowest Li score was merged, unless the lowest score 
was greater than 1, in which case the shallowest blue node was promoted to red. 

4 Experiments 

In this section, we describe several experiments that we used to evaluate our 
approach. In Section 4.1, we describe the experimental setup, and in Sec- 
tions 4.2, 4.3, and 4.4 we describe a series of experimental runs in which we zero 
in on so-called malicious behavior (which, for us, is the behavior of a browser). 

4.1 Obtaining Behavior Data for NT Executables 

The security logs used in our experiments came from two sources. One was the 
data for the 1999 Lincoln Labs intrusion detection evaluation, where 5 weeks of 
data was generated for traffic simulating that of an Air Force base. The second 
source was one of our own systems, which we used to capture additional traces 
for Internet Explorer, since the Lincoln data provided only one such trace. In 
both cases, the security logs were created with base-object auditing enabled. 
User-level auditing was set to collect the largest possible amount of information, 
but file and directory auditing were turned off. 

For these preliminary results, we did not have a great enough variety of 
malicious executables to get useful results. The problem with having too few 
executables is that, with so little data, there is little difference between training 
and memorization. Execution traces used to test our technique would, with high 
likelihood, also have appeared verbatim in the training data. Thus, we would only 
be testing the system’s ability to memorize training sequences, not its ability to 
generalize. Lack of variety in the training data also hurts the generalization 
ability of the trained system. 

Therefore, we used the web-browsers Netscape and Internet Explorer to stand 
in for malicious executables. The goal in these tests is to train the system with a 
number of programs, including Netscape or Internet Explorer, with the browser 
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program tagged as being as malicious. The hope is that, due to the functional 
similarities between the two browsers, we can train a system that recognizes 
either browser as bad, while recognizing other programs as good. This establishes 
the ability of finite automata trained on execution traces to find features that 
identify a program’s functionality. 

In particular, if we can identify execution traces from one browser after train- 
ing only on execution traces from the other browser, we will have evidence that 
this technique can, in fact, isolate features useful for generalization in execution 
traces. To a lesser extent, it would also be useful to train in execution traces 
from one browser and then recognize different execution traces from the same 
browser, since this also indicates that we can generalize from one execution-trace 
to a second, non-identical trace that still performs the same essential functions. 

4.2 Experiment 1 

In our first experiment, the bad program was Netscape, and the good programs 
were FINDFAST, LS, SPOOLSS, and advanced. Weeks 2 through 5 of the Lincoln 
data were used for training (that is, occurrences of the above programs were 
filtered from Weeks 2-5 of the Lincoln data and used for training) . 

Testing was done using Week 1 of the Lincoln data. The classifier was first 
tested on the programs used for training, but the execution traces were taken 
from Week 1, and were thus generally different than the traces used during 
training. In addition, the classifier was tested on dotlnetd, explorer, iexplore, 
nsbind, perl, posix, and rpcss, which were not used during training. For this 
experiment, an alert was raised whenever a bad transition occurred. 




A B C D E 

Fig. 2. Scores for programs used for training in experiment 1. (A) FINDFAST; 
(B) LS; (C) SPOOLSS; (D) advanced; (E) netscape 
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Fig. 3. Scores for programs not used for training in experiment 1. (A) perl; (B) 
posix; (C) iexplore; (D) inetinfo; (E) nsbind; (F) rpcss 



The results for the training programs are shown in Figure 2. Each program 
was executed several times during Week 1, and the different execution traces 
are plotted along the horizontal axis. The vertical axis shows the score for each 
trace, and the scores take on one of three values: 0, meaning that the execution 
trace was accepted by the automaton and not tagged as being bad, 0.1, meaning 
that the trace was rejected before traversing an edge that would cause it to be 
tagged as bad, and 1, meaning that the execution was tagged as bad. (To make 
all points visible, the vertical range of the plots goes from —0.2 to 1.2.) 

For the programs used to train the automaton, the results were more or less 
as expected. The four programs that the classifier was instructed to regard as 
good had low scores overall, while Netscape had high scores. However, a number 
of traces from LS had high scores; we might say that those traces triggered false 
alarms. 

Figure 3 shows the results for programs that were not used for training. 
Our goal is to identify the behavior of web browsers, so the goal is to get high 
scores for Internet Explorer (iexplore) and low scores for the other programs. 
Unfortunately, three of the programs generate high scores even though they 
should not do so; again, these can be seen as false alarms. Ten of the eleven 
execution traces for Internet Explorer were identified, but one was rejected before 
being tagged as bad. 

4.3 Experiment 2 

To improve this situation — that is, eliminate the false positives caused by 
nsbind, inetinfo, and rpcss, and prevent iexplore from falling off of the 
FSM, we need more training data. To provide paths for rpcss, nsbind and 
inetinfo that do not involve bad transitions, it may be enough to add more 
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good programs, and since we only want to make the FSM accept iexplore, more 
good training data might suffice for it as well. To this end, we add dotlnetd to 
the training data. (We are concerned with examining the behavior of the learning 
algorithm; if we were trying to reduce the number of false positives as quickly 
as possible we would retrain the FSM with one or more of nsbind, inetinfo, of 
rpcss labelled as good instead of using those programs only during testing.) 




A 




C 




D 



E F 

Fig. 4. Scores for programs used for training in experiment 2. (A) FINDFAST; 
(B) LS; (C) SPDOLSS; (D) advanced; (E) dotlnetd; (F) netscape 



Figure 4 shows the results of testing the programs that were also used during 
training, while Figure 5 shows the results for the programs not used during 
training. 

The results of the second experiment are encouraging. All execution of 
iexplore now trigger an alert, while nsbind fails to trigger any alarms before 
falling off of the FSM. 
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A B C D E F G 

Fig. 5. Scores for programs not used for training in experiment 2. (A) perl; (B) 
posix; (C) iexplore; (D) inetinfo; (E) nsbind; (F) rpcss; (G) explorer 



Unfortunately, inetinfo and explorer continue to generate high scores. 
Additionally, rpcss and LS generate many false positives, in spite of the fact 
that LS was used for training. 

4.4 Experiment 3 

One way to make inetinfo stop generating false positives would be to add still 
more training data, continuing the process we began above. Unfortunately, we 
found that adding perl or posix to the training data as good programs did not 
have the desired affect. 

Note that the addition of more training data does not simply add states and 
transitions to the automaton, but changes its structure, because there are more 
states available for merging. Therefore, each program we might add as a good 
program to the training data would potentially lead to a different FSM, and it 
is likely that we would sooner or later stumble across an automaton that met 
our needs simply by chance. 

However, it does not seem worthwhile to demonstrate this point empirically. 
Therefore, we simply added some execution traces from inetinfo to the training 
data, labelled as good. (Recall that the exact execution traces used for training 
are always distinct from the ones used for testing, though the traces may be 
generated by different executions of the same program.) 

This has the desired affect. Figure 6 shows the results for the programs tagged 
as good during training and Figure 7 shows the results for the programs not used 
during training. 

The programs used for training now behave as they should, with Netscape 
alone generating high scores. LS no longer generates false alarms. 

Among the programs not used during training, the results were also satisfy- 
ing; the score for the non-browser programs was low (except for an rpcss trace 
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E F G 

Fig. 6. Scores for programs used for training in experiment 3. (A) FINDFAST; (B) 
LS; (C) SPOOLSS; (D) advanced; (E) dotlnetd; (F) inetinfo; (G) netscape 



that lead to a false alarm), and iexplore’s scores were all high. This is shown 
on Figure 7. 

5 Conclusion and Future Work 

The experiments reported above are promising. It appears relatively straight- 
forward to isolate those features of an execution trace that identify particular 
aspects of program behavior. 

We tried to identify web browsers by their execution traces, but the ultimate 
goal is to identify programs with certain types of malicious behavior. Of course, 
it is necessary for the programs in question to carry out their malicious activities 
in similar ways; for example, we would not expect to identify back-door server 
if the programs labelled as bad during training were exploits of race conditions 
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Fig. 7. Scores for programs not used for training in experiment 3. (A) perl; (B) 
posix; (C) iexplore; (D) nsbind; (E) rpcss; (F) explorer 



being used to gain privileged access. In a sense, this approach automates the 
creation of signatures to identify particular classes of malicious executables. 

It is worth noting that some of the programs that were initially hard to 
distinguish from browsers, inetinfo, nsbind and rpcss, have a more browser- 
like behavior than the others because they access the network while the others 
do not. explorer, which was also harder to distinguish from browsers, could also 
be said to have behavior similar to that of a browser, though in a different sense 
since it accesses local files. This suggests that the system is creating meaningful 
measures of semantic similarity, and not being driven by features that result 
from statistical flukes. On the other hand, it would be interesting to see whether 
a totally unique browser implementation, say, a browser implemented as a perl 
script, would be detected. 

However, there is an unsatisfying aspect in the experiments just reported. We 
tweaked the training data until we got satisfactory performance. The actions we 
took were relatively intuitive, and they led to the desired result fairly quickly; in 
fact, all traces for rpcss, and ten of the eleven traces for iexplore, were really 
added after training was complete, i.e., the system did not require additional 
training to classify these traces correctly. Nonetheless, it can still be argued 
that we were certain to achieve our aims eventually in any case, if only we 
had tried sufHciently many different combinations of training data. We could 
also have varied the parameters of the training algorithm, such as the depth 
of the signatures used when comparing two nodes to score a potential merge. 
We might even have tried completely different merging strategies. Therefore, the 
question is whether our results reflect the usefulness of the technique we have 
been discussing, or whether they simply reflect our own skill at homing in on a 
good training setup. 
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At the moment, we cannot provide a rigorous answer to this question. We can 
only say that the comparative ease with which we achieved success suggests that 
our approach is well suited to the task. However, techniques such as structural 
risk minimization [9] or using a mixture of experts [6] allow the trial-and-error of 
techniques like ours to be automated in reasoned way. That is, a machine learning 
algorithm can empirically examine a number alternative automata trained by a 
number of alternative learning techniques; it can choose the best automaton (or 
best combination of automata) based on their performance on a test sample, 
and still have some confidence of obtaining a result that works well on new data. 
This is the next step in our work. 
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Abstract. Inference methods for detecting attacks on information re- 
sources typically use signature analysis or statistical anomaly detection 
methods. The former have the advantage of attack specificity, but may 
not be able to generalize. The latter detect attacks probabilistically, al- 
lowing for generalization potential. However, they lack attack models and 
can potentially “learn” to consider an attack normal. 

Herein, we present a high-performance, adaptive, model-based technique 
for attack detection, using Bayes net technology to analyze bursts of traf- 
fic. Attack classes are embodied as model hypotheses, which are adap- 
tively reinforced. This approach has the attractive features of both sig- 
nature based and statistical techniques: model specificity, adaptability, 
and generalization potential. Our initial prototype sensor examines TCP 
headers and communicates in IDIP, delivering a complementary infer- 
ence technique to an IDS sensor suite. The inference technique is itself 
suitable for sensor correlation. 

Keywords: Intrusion detection, Innovative approaches, IDS coopera- 
tion, Bayes nets. 



1 Introduction 

To date, two principal classes of inference techniques have been used in intrusion 
detection systems (IDS). In signature analysis [1], descriptions of known attacks 
are encoded in the form of rules. Statistical systems [2,3,6] intend to “learn” nor- 
mal behavior from data, and then issue alerts for suspected anomalies. Whatever 
inference techniques are used in IDS, they must typically meet stringent require- 
ments of extremely high throughput and extremely low false alarm rate. In this 
paper we describe eBayes TCP, which applies Bayesian methods [4,5] as an IDS 
inference technique. 

We have developed eBayes TCP as a component of the broad EMERALD 
system, which permits us to leverage from a substantial component infrastruc- 
ture. Specifically, eBayes TCP is an analytical component that interfaces to the 
EMERALD ETCPGEN and EMONTCP components [6]. ETCPGEN can pro- 
cess either live TCP traffic or TCPDUMP data in batch mode. EMONTCP ex- 
tracts the TCP state for a number of generally simultaneous TCP connections. 
When we refer to “events”, we mean events from EMONTCP, which already 
represents a considerable reduction from the raw TCP data. 
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The innovation provided by eBayes TCP is that it captures the best fea- 
tures of signature-based intrusion detection as well as anomaly detection (as 
in EMERALD ESTAT). Like signature engines, it can embody attack models, 
but has the capability to adapt as systems evolve. Like probabilistic compo- 
nents, it has the potential to generalize to previously unseen classes of attacks. 
In addition, the system includes an adaptive capability, which can “grow” quite 
reasonable models from a random start. 

EBayes TCP analyzes TCP sessions, which are temporally contiguous bursts 
of traffic from a given client IP. It is not very important for the system to 
demarcate sessions exactly. The analysis is done by Bayesian inference at periodic 
intervals in a session, where the interval is measured in number of events or 
elapsed time (inference is always done when the system believes that the session 
has ended). Between inference intervals, the system state is propagated according 
to a Markov model. After each inference, the system writes text and Intrusion 
Detection Internet Protocol (IDIP) alerts for sufficiently suspicious sessions. 

EBayes TCP consists of two components: a TCP-specific module that in- 
terfaces to appropriate EMERALD components and manages TCP sessions, as 
well as a high-performance Bayesian inference class library. The latter has po- 
tential not simply to analyze a specific data stream, but also as a fusion engine 
considering heterogeneous sensors. 

The remainder of this paper is organized as follows. We give a brief dis- 
cussion of Bayesian inference in trees, although the reader should refer to the 
bibliography for a more in-depth treatment. This is followed by a description of 
eBayes TCP itself, including the session concept, the TCP Bayes model struc- 
ture, and the important nodes (measures) considered. After the eBayes defini- 
tion, we present our innovative approaches to model adaptation and state tran- 
sition. We follow this with results from using simulated data from the Lincoln 
Laboratory 1999 Intrusion Detection Evaluation study [7], as well as live data 
monitored in real time from our LAN. 

2 Bayesian Inference 

Mathematically, we have adapted the framework for belief propagation in causal 
trees from Pearl [4]. Knowledge is represented as nodes in a tree, where each node 
is considered to be in one of several discrete states. A node receives tt (prior, 
or causal support) messages from its parent, and A (likelihood, or diagnostic 
support) messages from its children as events are observed. We think of priors as 
propagating downward through the tree, and likelihood as propagating upward. 
These are discrete distributions, that is, they are positive valued and sum to 
unity. The prior message incorporates all information not observed at the node. 
The likelihood at terminal or “leaf” nodes corresponds to the directly observable 
evidence. A conditional probability table (CPT) links a child to a parent. Its 
elements are given by 



CPTij = P {state = j\parentstate = i) 
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As a consequence of this definition, each row of a CPT is a discrete distribu- 
tion over the node states for a particular parent node state, that is. 



CPT,, > 0,V*,j 

J2cpt„ = i,yj 

j 

The basic operations of message propagation in the tree are most succinctly 
expressed in terms of vector/matrix algebra. We will adopt the convention that 
prior messages are represented as row vectors. Downward propagation of the 
prior messages is achieved by left multiplication of the parent’s prior by the 
CPT, that is. 



TT^node) = an {parent -node) • CPT 

where a is a normalizing constant to ensure that the result sums to unity. 
Note that since CPT is not required to be square, the number of elements in 
Tr{node) and n {parent -node) may be different. Since we limit ourselves to trees, 
there is at most one parent per node. However, there may be multiple children, 
so upward propagation of the likelihood messages requires a fusion step. For each 
node, the A message, represented as a column vector, is propagated upward via 
the following matrix computation: 

\ -to -parent {node) = CPT • \{node) 

Note that X{node) has number of elements equal to the number of states in 
the node, while \ -to -parent {node) has number of elements equal to the number of 
states in the parent node. These messages are fused at the parent via elementwise 
multiplication: 



Li{parent) = X-to-parenti{c) 

c^children{parent) 

Xi{parent) = Li{parent) / Lj {parent) 

3 

Here, L represents the raw elementwise product, and A is obtained by nor- 
malizing this to unit sum. Finally, the belief over the states at a node is obtained 
as follows: 



BE Li = jd-RiXi 

where /3 is a normalizing constant so that BEL has unit sum. Figure 1 illustrates 
propagation in a fragment of a tree. 
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3 Session Model 

We evaluate bursts of traffic as a means of integrating the signal (observable 
evidence of an attack or anomaly). Our simplest model considers temporally 
contiguous traffic from a particular IP address as a session. To detect distributed 
attacks, we also consider traffic to a protected resource, such as a server in the 
protected LAN. When we see an event, we examine our list of active sessions 
and see if we have a match. If we do, we update the matching session with 
the new event. Otherwise, we allocate a new session structure and initialize its 
observables with the event data. We manage the growth of the session list in 
two ways. First, we have a regularly scheduled “housecleaning” operation. All 
sessions are allowed to remain active for some time interval after the last event 
seen for the session. This interval is longer if the system believes the session has 
open (active) connections, but exists even if all connections are believed to be 
closed. The timeout intervals for sessions with and without open connections are 
configuration parameters. When housecleaning is invoked, all sessions for which 
the timeout is before the time of the housecleaning operation are deallocated. 
If the session table is at a maximum size set via system configuration and an 
event for a new session is observed, we invoke a “hindmost” operation, which 
deallocates the session with the most distant last event time. The inference engine 
is invoked periodically throughout each session, and always when a session is 
deallocated. 

Identification of when a burst begins or ends is itself not an assertion that 
can be made with certainty. Even in the case that all connections are closed, 
we must be careful not to deallocate immediately, as some attacks (such as 
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MAILBOMB) consist of a large number of successful open/close events, any one 
of which looks normal. Conversely, we must not wait indefinitely (or for the 
timeout interval) for all open connections to close, as many attacks work by 
opening connections which the attacker has no intention of completing. Again, 
a statistical determination of return to the idle state is appropriate, from the 
point of view of sensitivity as well as response time. The potential downside 
of deallocating a session prematurely is slight. At worst, we potentially report 
multiple events for the same attack (although in practice this is rarely seen). 



4 eBayes TCP Structure 

Our TCP model represents the (unobservable) session class at the root node, and 
several observed and derived variables from the TCPDUMP data as children. All 
child nodes are also leaf nodes (that is, considered observable). This structure is 
represented Figure 2, and is assumed to hold at each inference interval (or time 
slice) . 




This structure is sometimes referred to as the naive Bayes model, and tacitly 
assumes conditional independence of the child nodes given the parent. 

In this model, we apply Bayesian inference to obtain a belief over states of 
interest (hypotheses) for the session under consideration (or more generally, a 
burst of traffic from a given IP address). In our TCP model, the session class 
hypotheses at the root node are MAIL, HTTP, FTP, TELNET/REMOTE US- 
AGE, OTHER NORMAL, HTTP J, DICTIONARY, PROCESSTABLE, MAIL- 
BOMB, PORTSWEEP, IPSWEEP, SYNFLOOD, and OTHER ATTACK. The 
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first five represent normal usage modes, while the rest are attack modes. HTTP_F 
describes a pattern of long http sessions with abnormal terminations that is fre- 
quently seen in real-world traffic. At this point, we do not consider these sessions 
to represent attacks. 

EBayes-TCP is coupled with the eBayes service availability monitor, which 
learns valid hosts and services in the protected network via a process of unsuper- 
vised discovery. This enables detection of “stealth” port sweeps (scanning for a 
small, sensitive set of ports) without a predefined model of which sets of services 
are to e considered sensitive. 

For each session (as determined by the session model above) we accumulate a 
number of observables. Inference is done periodically for each session, where the 
period is a configurable number of events or elapsed time, and always when the 
system believes the session has ended. The model has three types of measures: 
intensity, high water marks, and distribution. 

Intensity measures are similar to the intensity measures we employed in 
EMERALD ESTAT. Specifically, an intensity measure is an exponentially de- 
cayed count. The intensity measures in eBayes TCP are defined as: 



Eventdntensityevent = Eventdntensityevent-i + 1.0 
Err or intensity event = Error _Intensityevent-i + 

(event has error return)?!. 0 : 0.0 
At = Time between the present and the immediately 
preceding event 
k = Decay constant(< 0) 

Intensity measures have the property that as long as behavior is normal and 
the decay constant is appropriately chosen, they do not grow without bound. 
The range of these measures is categorized to obtain the observed state values 
for the respective nodes. 

High-water-mark measures track a maximum count of an underlying observ- 
able. Our system maintains a list of ports and unique hosts accessed by this 
session. From these, we derive the maximum number of open connections to 
any unique host. As a configuration option, this maximum may or may not be 
reset at each inference interval. The total number of unique ports and hosts ac- 
cessed by the session is also recorded. As with intensity measures, the range is 
categorized to obtain the respective node states. 

Distribution measures track the distribution of the categories over an un- 
derlying discrete-valued measure. For example, for each event, we classify the 
service into one of MAIL, HTTP, FTP, or OTHER. Over an inference interval, 
we maintain a count of the number of times each was observed. The service 
distribution is then obtained as 



svc-dist={count(MAlL) count(HTTP) count(FTP) count(REMOTE) count(OTHER)} 
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Strictly, we should divide by the total count to obtain a true distribution, 
but the normalization in the internals of the Bayes inference handle this for us. 

Setting the observables means setting the A messages at the child nodes. We 
have overloaded the set^tate function to accept either an observed integer state 
value (in which case we set Xobs = l,Ai = 0 for i yf ohs) or a distribution (in 
which case we set, for example, A = svc-dist). 

5 Adaptive Capability 

Our system can potentially adapt by reinforcing its built-in models for the cur- 
rent observation (adjusting rows in the CPTs corresponding to the observed 
state at the parent) or by adding a new state (hypothesis) at the parent if the 
current observation is not in good agreement with any of the currently modeled 
hypotheses. 

5.1 Adaptive CPT Adjustment 

Adaptation via reinforcement proceeds as follows. We recall that the CPT re- 
lates a child node to its parent. In our representation, the rows of the CPT 
correspond to parent states, while the columns correspond to child states. If a 
single hypothesis is dominant at the root node, we adapt the corresponding row 
of the CPT matrix at each child slightly in the direction of the A message at 
the child node for the present observation. Specifically, if hypothesis i “wins” at 
the root node, we adjust CPT as follows. First, we decay the internal effective 
counts via a decay function: 



The decayed count is used as a “past weight” for the adjustment, and is the 
effective number of times this hypothesis has been recently observed. The CPT 
row is first converted to effective counts for each child state, and the present 
observation is added as an additional count distributed over the same states. 
Then the row elements are divided by the row sum so that the adjusted row has 
unit sum. This is accomplished by the following equation: 



By this procedure, the effective count never decays below 1.0 (if the hypoth- 
esis is never observed) and never grows beyond if the hypothesis is always 

observed. We typically choose the decay factor so that the effective count grows 



counts'l^'^°'^ = jcountsi -|- (1 — 7) 



fjpj^adj ^ count X CPT^j + Xj 



countsi X CPTij + Xj 



Finally, the internal counts are recomputed for all parent states: 
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to between 200 and 1000 observations. Observations for frequently seen hypothe- 
ses have a smaller CPT adjustment than do observations for rare hypotheses. In 
addition, since only “winning” hypotheses cause a potential CPT adjustment, 
our system has one key advantage over other statistical ID systems. A large 
number of observations for a hypothesis corresponding to an attack will not be 
considered “normal” no matter how frequently it is observed, as its adjustment 
only reinforces the corresponding internal attack hypothesis model in the system. 

5.2 Dynamic Hypothesis Generation 

Another form of adaptation is the potential ability to add a state. Naive Bayes 
models such as the one described above work well in practice as classifiers, and 
are typically trained with observations for which the true class is known. Dy- 
namic hypothesis generation as described here takes on a more difficult problem, 
namely, the situation where the data cases are unlabeled and even the underlying 
number of hypothesis states is unknown. In this situation, it is legitimate to ask 
if a system can self-organize to a number of hypotheses that adequately separate 
the important data classes. In this respect, the ability to separate attack classes 
A and B from each other is less important than the ability to separate both A 
and B from the set of nonattack classes. 

To build this capability, we need to enable the system to add hypotheses at 
the root node (the reader will recall that the root node state value is not directly 
observable) . As a configuration option, the system will create a “dummy state” 
at the root node (or more generally, at any node that is not directly observable) , 
with an effective count of I. If this node has children, a new CPT row is added 
at each child. We use a uniform distribution over the child state (each element 
has value — r-r ) for this CPT at present. 

Adding a state then proceeds as follows. The inference mechanism is applied 
to an observation, and a posterior belief is obtained for the dummy state as if 
it were a normal state. If this state “wins”, it is promoted to the valid state 
class and the CPT rows for all children are modified via the CPT adjustment 
procedure described above. Note that since the effective count of the dummy 
state is 1, the adjustment makes the CPT rows look 50observation. Then a new 
dummy state is added, allowing the system to grow to the number of root node 
states that adequately describe the data. This dummy state is not to be confused 
with the OTHER ATTACK hypothesis, for which there is an initial model of 
nonspecific anomalous behavior (e.g., moderate error intensity). 

There are two ways to exploit the hypothesis generation capability. In the 
first, we initialize the system with the normal and attack hypotheses described 
above, using CPTs derived from our own domain expertise. We observe that 
the system does adjust the CPTs somewhat, but does not choose to add more 
hypotheses when running in this fashion. From this, we tentatively conclude that 
no more than 12 hypotheses are needed to classify these data. 

Our next experiment examined the other extreme. We initialized the system 
with a single valid hypothesis and a dummy hypothesis at the root node. We 
then presented a week of normal (attack-free) data, and the system generated 
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two valid states. As these states were generated, the CPTs were adjusted ac- 
cording to the procedure previously outlined. We then arbitrarily decided that 
any new states learned would be reported as potential attacks, and presented 
data known to contain attacks. The system added 2 new states, which captured 
the attacks seen previously by the 11-state expert-specified model. There were 
a few false alarms, but well under the Lincoln Laboratory guideline of 10 per 
day for operational usefulness. Therefore, with the capabilities of adaptation via 
reinforcement as well as state space expansion described above, it is in fact pos- 
sible to start the system with essentially no initial knowledge. It then organizes 
to an appropriate number of hypotheses and CPT values. Interestingly, this sys- 
tem does nearly as well at separating the important classes (here, attack versus 
nonattack) as the expert-specified model with only 4 root node hypothesis states. 
Normal data is adequately represented by two states, and the variety of attack 
data by two abnormal states. While this does tend to separate important normal 
and attack classes into separate hypotheses, explaining the result is more diffi- 
cult. Nonetheless, this minimal knowledge approach does remarkably well, and 
is a very favorable indicator of the generalization potential of our methodology. 

In between inference steps, the belief state over session class passes through a 
Markov transition model, so as to yield a pre-observation belief state immediately 
before the next inference step. The following sections provide more detailed 
discussion of the observables used, the state transition, and the Bayes update 
mechanism. 

6 State Transition 

As a simplifying assumption, the states observed for the respective variables 
are considered to be independent of what was observed for these variables in 
past inference intervals, given the session class. In addition, given the value of 
the session class in the current interval, X is independent of any other observ- 
able variable Y. In other words, for all observable variables X, Y and inference 
intervals 0 to k, we have 



P{Xk = x\Sess-classk = s, Xfe_i...A'o, Yfc_i...Yo) = P{Xk = x\Sess-classk = s) 

The evolution of session class over inference intervals is modeled as a discrete 
time-and-state Markov process. The transition matrix is a convex combination 
of an identity matrix (to express state persistence) and a matrix whose rows 
are all equal to some prior distribution over the possible values of session class 
(to express the tendency of the process to decay to some prior state). In other 
words, for some 0 < 7 < I, the transition matrix M is given by 

M = 'yl + (1 - j)P 

where / is an identity matrix and each row of P is given by 



= PRIOR 
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and PRIOR is a prior distribution over possible values j for session class, that 
is, 

PRIORj = Prior probability(5'ess_cZass = j) 

Mij is the probability that if the process is currently in state i it will be in 
state j at the next event. More generally, if POSTJ3EL is our current belief 
state (a distribution over the possible state values, given the evidence up to and 
including this time interval), left multiplication with M redistributes our belief 
to obtain the prior belief before the next observation: 

PRE_BELfc = POST_BELfc_iM 

We manipulate the parameter 7 to capture, albeit imperfectly, the continuous 
nature of the underlying process. We typically invoke the inference function every 
100 events within a session, and always when the session enters the idle state. 
Some sessions are less than 100 events in total, while others (particularly many 
denial-of-service attacks) consist of tens of thousands of events in a very short 
time interval. In the latter case, even though many inference steps are invoked, 
we prefer to have a moderately high persistence parameter (about 0.75) because 
very little time has elapsed. If the parameter is 0, the belief reverts to the prior 
at each event. 

It can be shown that, unless 7 is unity, iteratively multiplying M by itself 
results in a matrix that approaches P, that is, 

limn^ooM"^ = P 

In practice, this limit is nearly reached for fairly small values of n. The result 
of this observation is attractive from the intuitive standpoint: in the absence 
of reinforcing evidence from subsequent events, the belief distribution tends to 
revert to the prior. 

The inference operation at interval k begins by setting the Bayes tt message 
to PREJBELfc. Then the observables over the interval are presented to the leaf 
nodes, and the belief state at the root node is extracted. If this is deemed suf- 
ficiently suspicious, alert messages are written both to an alert log and in ID IP 
format. 



7 Results 

7.1 Lincoln Laboratory 1999 Evaluation Study 

We have run our model against the TCP dump data from the 1999 Lincoln Labo- 
ratory IDEVAL data sets [7]. It is highly effective against floods and nonstealthy 
probe attacks, and moderately effective against stealthy probe attacks. 

This data simulates activity at a medium-size LAN with typical firewalls and 
gateways. Traffic generators simulate typical volume and variety of background 
traffic, both intra-LAN and across the gateway. Attack scripts of known types 
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are executed at known times, and the traffic (a mix of normal background as 
well as attack) is collected by standard utilities, such as TCPDUMP. 

For this prototype we examined external to internal traffic using the TCP /IP 
protocol. This means that console attacks, insider attacks, and attacks exploiting 
other protocols such as IDP and UDP are invisible. These are not theoretical 
limitations of eBayes, and we intend to include the UDP protocol in the near fu- 
ture. However, this did limit attacks that were visible to the system. The fourth 
week of the data set was considered the most difficult, as it contained the most 
stealthy attacks. We detected three visible portsweeps and missed one that ac- 
cessed 3 ports over 4 minutes with no errors. All of the portsweeps in this data 
set are stealthy by the standards of the Lincoln training data and the week 5 
data (we detect lOOnonstealthy sweeps). A Satan attack and a TCPRESET at- 
tack are also detected as portsweeps. This particular Satan attack was run in 
a mode where it in fact is characteristic of a portsweep. For the TCPRESET, 
the portsweep hypothesis slightly edges out the OTHER hypothesis. Other de- 
tected attacks in this data include MAILBOMB and PROCESS TABLE (both 
lOOas three password-guessing attacks (one detected as OTHER, two as DIC- 
TIONARY). The latter three detections demonstrate the power of the approach. 
They were not in the set of attacks that Lincoln thought should be detected by 
this sensor, so we initially considered them false alarms. Further review of the 
full attack list indicated that they were in fact good detections, even though at 
that time we had no DICTIONARY hypothesis and they were called OTHER. 
By elucidating characteristics of these attacks, we added the DICTIONARY 
hypothesis (indicative of password guessing), which now captures two of these 
attacks and is a close second to OTHER as a classification for the third. Also, 
one of these attacks was detected first by probabilistic methods (eStat and later 
eBayes) because the eXpert sensors had no signature for it. This signature has 
now been added, but the generalization potential of probabilistic detection is 
nonetheless clear. 

7.2 Real-World Experience 

We have eBayes-TCP active on our own TCP gateway, and it has proved to 
be stable for indefinite periods of time. The TCP event generator, EMONTCP, 
and Bayes inference components require about 15M on a Free BSD platform, 
and never use more than a few percent of the CPU. For real-world traffic, we of 
course have no ground truth, but the results have nonetheless proved interesting 
to us in the sense of scientific experimentation, as well as being of practical 
interest to our system administrators. 

Our initial observation was that, not surprisingly, real-world data contains 
many failure modes not seen in a set such as the IDEVAL data described above. 
For example, we regularly observe a pattern of http sessions of moderate or long 
duration in which a significant number of connections terminate abnormally, 
but on such a time scale and in such modes that we are fairly certain they 
are not malicious. To capture these sessions, we decided to add the HTTP_F 
hypothesis (for failed http). This reduced the alert volume to a manageable 
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15 or so per day. A representative two-week period comprised about 470,000 
connection events, grouped by the session model into about 60,000 sessions of 
which 222 produced alerts. It is important to point out that many of these are 
almost certainly attacks, consisting of IP and probe sweeps and some attempted 
denials of service. Some of the false alert mechanisms are understood and we are 
actively working to improve system response to these without being too specific 
(for example, ignoring alerts involving port 113 requests, which are screened in 
our environment but will be seen from normal mail clients). 

7.3 The Utility of Learning 

The learning procedures described above have proven useful in our experimenta- 
tion, guiding us both in refinement of existing hypotheses as well as developing 
new hypotheses for both normal and attack modalities. However, we have ob- 
served better operation if the adaptive capability is disabled, for several reasons. 
First, attacks and alert-worthy events are a very small fraction of total traffic 
in a real-world setting, so that learning an attack modality that may only be 
seen once is problematic. Second, we found that the normal hypotheses become 
“hardened” so as to be relatively intolerant of erroneous outcomes. The fraction 
of such outcomes for non-malicious reasons is too high to be tolerable from an 
alert standpoint, but is too low to permit sufficient “breathing room” if adap- 
tation is permitted indefinitely. For the present, therefore, we run the system in 
adaptive mode to identify im-anticipated modalities and large CPT deviations 
from what is observed in true traffic. We then take the results of this phase and 
moderate it with our judgement (sanding the corners off very hardened hypothe- 
ses, so to speak) and arrive at a batch specification of the CPT. We then verify 
that this new encoding remains sensitive against simulated datasets (such as the 
Lincoln data). At present, we detect the most attacks we have ever detected in 
the Lincoln data, and detect alert-worthy events in our real-world data with an 
acceptable level of apparent false alerts. 

8 Summary 

We have described the eBayes monitoring capability, which employs Bayesian 
inference steps with transition models between inference to assess whether a 
particular burst of traffic contains an attack. A coupled component monitors 
availability of valid services, which are themselves learned via unsupervised dis- 
covery. 

The efficacy of this system was demonstrated by results from the Lincoln 
Laboratory Intrusion Detection Evaluation data, and also by a live operation on 
a real-world site for weeks at a time. 

This provides us with several important new capabilities: 

— Probabilistic encoding of attack models provides a complementary capability 

to anomaly detection and signature analysis, retaining the generalization 

potential of the former and the sensitivity and specificity of the latter. 
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— We now potentially detect distributed attacks in which none of the attack 
sessions are individually suspicious enough to generate an alert. This com- 
prises correlation by aggregation. 

— Once a successful denial of service has taken place, we are much less likely 
to generate false alerts for nonmalicious clients requesting the service during 
the attack (we refer to these clients as “collateral damage”). This form of 
correlation fuses the belief that an attack is in progress with the symptom 
of the attack (the service is disabled when the attack achieves its objectives) 
to explain away subsequent alerts from “collateral damage” sessions. As 
such, the system correlating symptoms and attacks provides effective false 
alarm reduction, while still providing the administrator with an alert for the 
original attack as well as an indication of the status of the victim host/port. 

We continuously run this system along with our TCP session monitor on 
our own TCP gateway. While we do not have ground truth for this traffic, we 
regularly identify probe attacks and “spidering” activity, as well as the occasional 
DOS attempt. We also detect service outages and recovery for what appear to 
be nonmalicious faults. 
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Abstract. In practice, most computer intrusions begin by misusing pro- 
grams in clever ways to obtain unauthorized higher levels of privilege. 
One effective way to detect intrusive activity before system damage is 
perpetrated is to detect misuse of privileged programs in real-time. In 
this paper, we describe three machine learning algorithms that learn the 
normal behavior of programs running on the Solaris platform in order to 
detect unusual uses or misuses of these programs. The performance of 
the three algorithms has been evaluated by an independent laboratory in 
an off-line controlled evaluation against a set of computer intrusions and 
normal usage to determine rates of correct detection and false alarms. A 
real-time system has since been developed that will enable deployment 
of a program-based intrusion detection system in a real installation. 



1 Introduction 

Today, most commercial intrusion detection systems monitor network packets for 
unusual patterns, or patterns of known suspicious actions. Recent advances in 
high bandwidth local area networks have presented significant challenges to per- 
forming network monitoring in real-time. In addition, as end-to-end encryption 
protocols are adopted enterprise wide, many of today’s network-based intrusion 
detection systems will be rendered obsolete. 

Host-based intrusion detection systems attempt to detect computer intru- 
sions by monitoring audit trails created on host computer systems. Many mod- 
ern day operating systems provide audit trails for processes that run on the 
machine. On the Solaris platform, the Basic Security Module (BSM) provides 
a configurable audit manager that facilitates recording system events requested 
by executing processes. 

We leverage this audit reporting mechanism in this research. The motivation 
for our work is that a large class of computer intrusions involves program misuse. 
Most program misuse attacks exploit privileged programs in clever ways in order 
to gain unauthorized privileges that are subsequently used to commit malicious 
acts of sabotage or data theft. Buffer overrun attacks are the most frequent form 
of program misuse attacks. Other types of program misuse attacks include using 
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rarely used features (such as debug features), exploiting race conditions, and 
triggering Trojan horse functionality in order to gain higher privileges. 

When a program is misused, its behavior will differ from its normal usage. 
Therefore, if the normal range of program behavior can be adequately and com- 
pactly represented, then behavioral features captured by audit mechanisms can 
be used for intrusion detection. 

A well-recognized failing of today’s commercial intrusion detection systems 
is that they cannot detect novel attacks against systems, and they often fail to 
detect variations of known attacks. The reason is that most commercial intrusion 
detection systems detect attacks by matching audit events against well-known 
patterns of attacks. This approach is known as signature-based detection. The 
problem with a signature-based detection approach is that it is reactive by na- 
ture. Once a new form of intrusion is developed, it is often perpetrated against 
many systems before its signature is captured, codified, and disseminated to in- 
dividual detection sensors. In a worm- type of infection, millions of machines can 
potentially be compromised before a signature-based system can be upgraded 
with the appropriate signature. 

To detect novel attacks against systems, we develop anomaly-based systems 
that report any unusual use of system programs as potential intrusions. The 
advantage of this approach is that both known attacks and novel attacks are 
detected. The disadvantage is that if the training mechanism for the detection 
sensor is not robust, a large number of false alarms may be reported. In other 
words, perfectly legitimate behavior may be reported as intrusions. 

Another large challenge in intrusion detection is to generalize from previously 
observed behavior (normal or malicious) to recognize similar future behavior. 
This problem is acute for signature-based misuse detection approaches, but also 
plagues anomaly detection approaches that must be able to recognize future 
normal behavior that is not identical to past observed behavior, in order to 
reduce false positive rates. 

In the research reported here, we address both challenges: detecting novel at- 
tacks as well as generalizing from previously observed behavior in order to reduce 
the false positive rate to acceptable levels from an administration standpoint. 

We develop an anomaly detection system that uses machine learning au- 
tomata to learn the normal behavior for programs. The trained automata are 
then used to detect possibly intrusive behavior by identifying significant anoma- 
lies in program behavior. The goal of these approaches is to be able to detect not 
only known attacks and but also detect future novel attacks using off-the-shelf 
auditing mechanisms provided by the operating system vendor. 

We develop three algorithms for learning program behavior profiles and de- 
tecting significant deviations from these profiles. The algorithms were evaluated 
by an independent laboratory in a controlled off-line experiment to determine 
their effectiveness against program misuse attacks. The performance of the algo- 
rithms is presented as a measure of the probability of correct detection against 
the probability of false alarm. 
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Finally, in Section 5, we describe a real-time system that implements one of 
the learning algorithms to detect intrusions in real-time. 

2 Related Work 

Analyzing program behavior profiles for intrusion detection has recently emerged 
as a viable alternative to user-based approaches to intrusion detection (see [11, 
18, 14, 6, 4, 7, 16, 2] for other program-based approaches). Program behavior 
profiles are typically built by capturing system calls made by the program under 
analysis under normal operational conditions. If the captured behavior represents 
a compact and adequate signature of normal behavior, then the profile can be 
used to detect deviations from normal behavior such as those that occur when 
a program is being misused for intrusion. 

For a detailed comparison of our general approach to program-based intrusion 
detection with those of others in this area, please see [10]. 

3 Three Machine Learning Algorithms for Anomaly 
Detection 

As described in the introduction, we are interested in detecting novel attacks 
against systems by detecting deviations from normal program behavior. To this 
end, we have developed three machine learning algorithms to train automata 
to learn a programs’ normal behavior. The trained program automata are sub- 
sequently used to detect program misuse. The three algorithms are: an Elman 
recurrent artificial neural network, a string transducer, and a finite state tester. 
Each algorithm is described next. 

3.1 Elman Recurrent Neural Network 

The goal in using artificial neural networks (ANNs) for anomaly detection is 
to be able to generalize from incomplete data and to be able to classify online 
data as being normal or intrusive. An artificial neural network is composed of 
simple processing units, or nodes, and connections between them. The connection 
between any two units has some weight, which is used to determine how much one 
unit will affect the other. A subset of the units of the network acts as input nodes, 
and another subset acts as output nodes. By assigning a value, or activation, to 
each input node, and allowing the activations to propagate through the network, 
a neural network performs a functional mapping from one set of values (assigned 
to the input nodes) to another set of values (retrieved from the output nodes). 
The mapping itself is stored in the weights of the network. 

We originally employed ANNs because of their ability to learn and general- 
ize. Through the learning process, ANNs develop the ability to classify inputs 
from exposure to a set of training inputs and application of well defined learning 
rules, rather than through an explicit human-supplied enumeration of classifica- 
tion rules. Because of their ability to generalize, ANNs can produce reasonable 
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A B 

Fig. 1. In each of the examples above, the nodes of the ANNs are labeled as 
input nodes (i), hidden nodes (H), output nodes (O), or context nodes (C). 
Each arc is unidirectional, with direction indicated by the arrow at the end of 
the arc. A) A standard feed- forward topology. B) An Elman network 



classifications for novel inputs (assuming the network has been trained well). 
Further, since the inputs to any node of the ANN used for this work could be 
any real-valued number, no sequence of BSM events could produce an encoding 
that would fall outside of the domain representable by the ANN. 

In order to maintain state information between inputs, we require a recur- 
rent ANN topology. A recurrent topology (as opposed to a purely feed-forward 
topology) is one in which cycles are formed by the connections. The cycles act as 
delay loops — causing information to be retained indefinitely. New input interacts 
with the cycles, affecting both the activations propagating through the network 
and the activations in the cycle. Thus, the input can affect the state, and the 
state can affect the classification of any input. 

One well known recurrent topology is that of an Elman network, developed 
by Jeffrey Elman [-5]. An Elman network is illustrated in Figure 1. The Elman 
topology is based on a feed-forward topology — it has an input layer, an output 
layer, and one or more hidden layers. Additionally, an Elman network has a set 
of context nodes. Each context node receives input from a single hidden node 
and sends its output to each node in the layer of its corresponding hidden node. 
Since the context nodes depend only on the activations of the hidden nodes from 
the previous input, the context nodes retain state information between inputs. 

We employ Elman nets to perform classification of short sequences of events 
as they occur in a larger stream of events. Therefore, we train our Elman net- 
works to predict the next sequence that will occur at any point in time. The nth 
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input, In, is presented to the network to produce some output, On- The out- 
put On is then compared to In+i- The difference between and In+i (that is, 
the sum of the absolute values of the differences of the corresponding elements 
of On and In+i) is the measure of anomaly of each sequence of events. Or, in 
other words, the anomaly measure is the error in predicting the next input in 
sequence. The classification of a sequence of events will now be affected by events 
prior to the earliest event occurring within the sequence. 

3.2 String Transducer 

A string transducer is an algorithm that associates a sequence of input sym- 
bols with a series of output symbols. String transducers are most often used in 
computational biology and computational linguistics, where they are usually im- 
plemented using finite automata whose transitions or states are associated with 
output symbols. In the current context, we use automata as well, but the input 
sequence is a string of BSM events, and the output sequence is a prediction for 
the next several events. 

Our use of string transducers as intrusion detectors is based on an examina- 
tion of the probabilities of the output symbols at each state. During training, we 
estimate the probability distribution of the symbols at each state, and during 
testing, deviations from this probability distribution are taken as evidence of 
anomalous behavior. 

Our implementation of this idea is relatively simple. We use a finite au- 
tomaton whose states correspond to n-grams in the BSM data, and the output 
symbols associated with each state are also BSM £-grams (for £ < n). More 
specifically, the output symbol represents sets of I BSM events that may be 
seen when the automaton is in a given state. During training, our goal is to 
gather statistics about these successor ^-grams; we estimate the probability of 
each £-gram by counting. 

During actual intrusion detection, the deviation of the successor ^-grams from 
their expected values are used for anomaly scores. Of course, the anomaly scores 
are usually non-zero, but if the program is behaving normally these deviations 
should average out over time. 

In the ideal case, it can be shown that the anomaly scores are uncorrelated if 
the probability distributions have, in fact, been correctly estimated (this is due 
to the fact that the deviations are then an innovations process; see [1]). That 
means that if we subtract the mean anomaly score for each state from the actual 
anomaly scores generated there, the result is zero-mean white noise. 

If these values are integrated over a sufficiently long period, the result should 
be close to zero if the program is behaving normally. However, if abnormal 
program behavior results in a significant deviation of the successor ^-grams from 
their expected values, then the resulting scores will not integrate to zero, and 
this fact can be used to detect anomalous behavior. 

In practice, there are obviously a number of factors preventing the realization 
of this ideal case: 
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1. If the probabilities of the successor ^-grains have not been correctly esti- 
mated, then the deviations may not be uncorrelated. 

2. During detection, n-grams may be encountered that do not correspond to 
any known state because they were not seen during training. 

3. An intrusion may not result in a systematic deviation from the expected 
£-gram values; in other words, the intrusion may look normal. Although this 
seems unlikely, we cannot prove that all intrusions really cause the necessary 
deviations. 

4. The window of integration needed to get sufficiently low anomaly scores 
during normal behavior may be large. This delays the detection of anomalies 
(though if it prevented them from being detected we would arguably be in 
case 3). 



The fourth is an intrinsic problem of change detection [15]; there is an inevitable 
tradeoff between the time to detection and the susceptibility to false positives. 
The third problem is also, in some sense, unavoidable; it seems unlikely that 
we could guarantee the detection of all intrusions without assuming something 
about the nature of those intrusions, which is contrary to our assumptions. (We 
may, of course, be able to make guarantees for certain classes of intrusions). 

The second problem cited above is more directly related to our specific appli- 
cation. It results from having too little training data to characterize all states. It 
dictates that states should not be too highly specialized, since such specialization 
makes it less likely for all states to be seen during training. 

The first problem dictates a wise choice of states. For example, it has been 
observed that programs go through different phases of behavior [3] , so the prob- 
ability of a given Agram may depend on how far along the program is in its 
execution. Thus, states should reflect the state of the program itself. Even if 
the distribution of £-grams varies over time, the distribution from a given state 
should be constant. Unfortunately, this condition can be best achieved by using 
highly specialized states to avoid having two or more states of the underlying 
program represented by a single state of the automaton. Thus, the solutions to 
the first and second problems are in some sense at odds. This tradeoff between 
expressiveness and ease of training is also well-known in machine learning [19]. 

As we have said, the probability densities of the successor Ugrams in a given 
state are estimated by counting (that is, we simply count the number of occur- 
rences of each Ugram in the training data). This approach is feasible with BSM 
data because it tends to be fairly regular; the number of BSM Ugrams is much 
smaller than, say, the number of possible BSM events raised to the Ith power. 

We measure deviations from expected behavior by treating the estimated 
probability distribution as a vector, which we first normalize with respect to 
the Lk metric, 

l/k 



E 



2 



for some k. When a given Ugram occurs during detection, we treat it as a vector 
with a 1 in the position corresponding to the actual Ugrams that were seen. 
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and a 0 in the other positions. The deviation is proportional to the Lk distance 
between this vector and the normalized density vector. In other words, iipi is the 
estimated probability of the tth .^-gram, according to some arbitrary ordering, 
then the elements of the normalized probability vector are given by 



and the deviation di, reported when the tth .^-gram is seen during detection, is 
given by 



We treat these as summations over all possible .^-grams, though the actual im- 
plementation only has to sum over those that were seen during training since pj 
is zero for the others. But if a novel .^-gram is seen during testing, this convention 
assures that di is still defined, and, in fact, its value is just 1. 

3.3 State Tester 

The goal of the third algorithm, we call simply a state tester, is to automatically 
create finite automata to represent program behavior. Since data representing 
intrusive behavior is not used during training, the first goal is simply to build 
a finite automaton that accepts all audit sequences in the training data, but 
without being so generous that it accepts all data, or being so rigid that it 
rejects every novel audit sequence after training. 

In [13], finite automata (FA) of this kind were generated largely by hand. 
First, the BSM data was pre-processed so that commonly occurring sequences 
of events could be combined into a single meta-event. Then, the meta-events were 
encoded as an FA. The combination of events into meta-events, called macros 
in that paper, was done manually, and though the paper does not say whether 
the FAs were then also created by hand, it is implied that they were. 

Our approach is to automate the process of inferring finite automata. Some- 
thing along these lines is done in [3] , where training data is used to learn hidden 
Markov models of normal program behavior. This technique proved effective at 
the task of intrusion detection, but training (using the Baum- Welch algorithm, 
see [17]) was found to be expensive. This raises the question of whether simpler 
algorithms that only infer an FA, and not the transition probabilities associated 
with a Markov model, might also be effective without requiring as much training. 

Below, we present an algorithm for automatically constructing finite au- 
tomata from training data. In this context, it should be noted that the inference 
of finite automata is not intractable, although the automatic inference of finite 
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automata is intractable in a number of other settings (C.f., [12, 9]). What makes 
the problem tractable in the case of anomaly detection is that the requirements 
are simple. The finite automaton merely has to accept any training sequence 
that is not abnormal. Of course, it should also reject abnormal BSM sequences, 
but since there are no abnormal BSM sequences in the training data this re- 
quirement cannot be formalized within the learning algorithm itself. Rather, we 
will evaluate the performance of the FAs empirically. 

By way of example, we could create an FA with a single state, where every 
BSM event results in a transition from that state back to itself. We could also 
create an FA with no cycles that accepts exactly the BSM sequences occurring 
in the training data. 

The first approach is too weak because it tends to accept any sequence of BSM 
events, and thus fails to notice abnormal BSM sequences. The second approach 
is probably too strong, because it rejects any sequence as being abnormal unless 
exactly the same sequence was seen during training. Our goal is to create a 
reasonably expressive FA, but one that can still generalize. Of course, this is a 
qualitative requirement. 

The first issue is how to define the states of the automaton. The technique 
reported in this paper associates each state with one or more n-grams of BSM 
data, where n is a parameter of the learning algorithm. For example, the FA 
might have a state corresponding to the event sequence Istat, open, ioctl, 
and enter that state whenever the sequence Istat, open, ioctl is seen. The 
idea, however, is to be parsimonious in the creation of new states, and not simply 
have one state in the FA for every n-gram of BSM events. Instead, we will have 
more than one n-gram assigned to most of the states. 

During training, separate automata are created for the different programs 
whose audit data are available for training. As with the intrusion detection 
systems of [8], the training algorithm is presented with a series of n-grams taken 
from non-intrusive BSM data for a given program. Conceptually, the goal of the 
automaton is to predict the entire n-gram based on the automaton’s current 
state and on the first (. audit events in the n-gram, i < n. 

The FA’s transitions correspond to specific sequences of I audit events, and 
each state corresponds to one or more n-grams. We say that the FA predicts an n- 
gram G if there is a transition from the current state to the state corresponding 
to G, and if that transition is labeled with the first I elements of G. Thus, the 
automaton predicts a set of states, and these states are simply the ones reachable 
by transitions labeled with the first i elements of G. If this set is empty (e.^., 
there is no transition labeled with the first I elements of G) then we say that 
the FA makes no prediction at all. Otherwise, a prediction error occurs if the 
predicted set of states does not contain the one associated with G. 

During training, an incorrect prediction results in the creation of a new tran- 
sition and possibly a new state. The training algorithm starts with an FA having 
a single state and no transitions. We say that the FA is initially in this state. 
Whenever a new training n-gram is seen, there are three possibilities: 
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1 . The current state has an outgoing edge that corresponds to the first I events 
in the n-gram, and that edge leads to the correct state (the correct state is 
the state that is assigned to the newly obtained n-gram). In this case, the 
FA needs no modifications. 

2. The current state has outgoing edges that correspond to the first t events in 
the n-gram, but none of the edges lead to the correct state. In this case, the 
FA may contain a correct state (but no edge from the current state to the 
desired state), or else the FA may not even have any state assigned to the 
new n-gram. 

We simply create a state for the new n-gram if one doesn’t already exist. In 
either case, we create a transition from the current state to the new state, 
and label that transition with the first £ events of the new n-gram (recall 
that we will use these I events when trying to make future predictions). 

3. The current state has no outgoing edges that correspond to the first I events 
in the newly obtained n-gram. If there is already a state assigned to the 
newly obtained n-gram, then we simply create a transition to that state, 
and label it with the i events as in the previous case. 

However, if the new n-gram doesn’t have any state assigned to it, we can 
assign any one of the already existing states, or create a new state, without 
introducing any prediction errors. Currently, the algorithm just creates a 
transition back to the current state, and assigns the new n-gram to the 
current state (where it joins whatever n-grams were assigned to that state 
previously) . 

In all three cases, the FA transitions to the state assigned to the new n-gram. 

4 Performance of Algorithms 

The three algorithms described in the preceding section were implemented 
and evaluated by an independent laboratory, Lincoln Laboratory of the Mas- 
sachusetts Institute of Technology, in the 1999 U.S. Defense Advanced Research 
Projects Agency (DARPA) Intrusion Detection Evaluation. The full extent of 
the experimental setup, the data, the participants, system descriptions, full at- 
tack descriptions, raw scores, and results are available online at the Lincoln 
Laboratory’s Intrusion Detection Evaluation page (http://ideval.ll.mit.edu). In 
this section, we summarize the results of our systems. 

Lincoln established four categories of attacks: Denial of Service (DoS), probe, 
remote-to-local (R2L), and user-to-root (U2R). Within these categories they 
ran several select instances of attacks. Lincoln does not claim these attacks are 
comprehensive of the category of attacks. Rather, the attacks can be considered 
as samples from the attack space within a category. DoS and probe attacks were 
network-based attacks that leave traces in network packet data. Remote-to-local 
attacks involved network-based attacks again, but also included some attacks 
that attempted to misuse host-based programs. User-to-root attacks attempt to 
gain super user privileges on the host machine either by misusing programs or 
by running malicious software. 
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While our approach is not exclusive to any single category of attacks as par- 
titioned by Lincoln, our approach is best suited to detect user-to-root attacks 
according to the Lincoln partitions. Our approach will detect program misuse 
attacks regardless of which of the four Lincoln categories the attacks falls in, as 
long as the attack leaves some trace in the audit data we use. In addition to the 
user-to-root attacks, a few instances of the remote-to-local attacks involved pro- 
gram misuse. So, we also include results from detecting remote-to-local attacks 
in this section. 



Table 1. List of programs monitored by intrusion detection automata 



admintool 


dhcpcd 


kswapd 


ping 


sperlS. 00404 


wu.ftpd 


allocate 


dos 


list .devices 


procmail 


sshl 


xlock 


aspppd 


eject 


lockd 


ps 


sshd 


xscreensaver 


at 


exrecover 


login 


pt_chmod 


su 


xterm 


atd 


fdformat 


Ipd 


pwdb.chkpwd 


suidperl 


Xwrapper 


atq 


ff.core 


Ipq 


rep 


syslogd 


ypbind 


atrm 


ffbconfig 


Ipr 


rdist 


tepd 


yppasswd 


auditd 


fsflush 


Iprm 


rdistd 


timed 


zgv 


automountd 


gpasswd 


m64config 


rlogin 


traceroute 




cardctl 


gpm 


mingetty 


routed 


umount 




chage 


hpnpd 


mkdevalloc 


rpebind 


uptime 




chfn 


untd 


mkdevmaps 


rpciod 


userhelper 




chkey 


in.* 


mount 


rpld 


usernetctl 




chsh 


inetd 


newgrp 


rsh 


utmp.update 




cron 


kcms_calibrate 


nispasswd 


rusersd 


utmpd 




crond 


kcms_configure 


nmbd 


rwhod 


uu.* 




crontab 


kerbd 


used 


sacadm 


volcheck 




ct 


kerneld 


nxterm 


sadmind 


void 




cu 


kflushd 


pageout 


sendmail 


w 




deallocate 


klogd 


passwd 


smbd 


whodo 





Since our approach involves training program monitors, we must first choose 
which programs to monitor. Most attacks, in practice, are launched against priv- 
ileged programs on network servers. So, our rule was to train program monitors 
on SUID root programs that run on Unix servers. Table 1 lists the programs we 
monitor for intrusions and also represents a superset of program monitors run 
against the Lincoln Laboratory data because not all programs in Table 1 are 
exercised by the Lincoln data. 

4.1 Performance of Elman Networks 

Figure 2 shows the performance of the Elman networks on the BSM data against 
both U2R (Figure 2(a)) and R2L (Figure 2(b)) attacks. The plots are called De- 
tection/False Alarm plots by Lincoln Laboratory. The plot shows the probability 
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Fig. 2. Performance of Elman networks on BSM data against User-to-Root 
(U2R) and remote-to-local (R2L) attacks 



of correct detection versus the false alarm rate per day. Examining the user-to- 
root attacks first, it becomes clear that the Elman networks performed very well 
against this class of attacks. The Elman networks achieved 100% detection of at- 
tacks very quickly at a false alarm rate of close to 3 per day. This false alarm rate 
is considered acceptable in an operational environment and is vastly superior to 
current commercial tools. 

A closer examination of the attacks showed that the vast majority of them 
involved program misuse types of attacks such as buffer overrun attacks. How- 
ever, our technique is not limited to buffer overrun attacks. Rather the approach 
is designed detect any program misuse attack. It turns out that the sample U2R 
attacks chosen by Lincoln were all buffer overrun attacks. As more different 
types of program misuse attacks are captured in evaluation sets, we will be able 
to verify this claim in the future. 

The performance of the Elman networks against Lincoln’s remote-to-local 
attacks was not nearly as good, as shown in Figure 2(b). At a rate of approxi- 
mately 10 false alarms per day, we detected roughly 30 percent of R2L attacks. 
If you are willing to accept a false alarm rate of up to 100 per day, the correct 
detection rate goes over 90 percent. However, operationally speaking, that false 
alarm rate is not acceptable. 

In the 1999 evaluation, the R2L attacks run by Lincoln by-and-large did not 
involve program misuse. Thus, most of these attacks fall outside the scope of 
our approach. For example, the guessftp, ftpwrite, and guest R2L attacks 
all involve using the legitimate protocol to either guess passwords or write files 
(when the program was configured to do so). 

Other remote-to-local attacks involved malicious clients acting on behalf of 
an outside perpetrator. Since we only monitor programs we know about, we do 
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not detect malicious programs. However, our technique can detect intrusions that 
may have been precursors to installing malicious clients. In summary, attacks 
that involve programs we do not monitor or attacks that involve normal uses of 
programs fall outside the scope of our detection mechanism. The reason we did 
end up detecting them at all (even at a high false alarm rate) is that side effects 
from the intrusion tend to show up in other programs we monitor, albeit at a 
high false alarm rate. 

4.2 Performance of String Transducer 
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Fig. 3. Performance of string transducer on BSM data against User-to-Root 
(U2R) and remote-to-local (R2L) attacks 



Figure 3 shows the performance of the string transducer against U2R and 
R2L attacks. The performance of the string transducer is very close to that of the 
Elman network. At a rate of about 3 false positives a day we detected 100% of 
the user-to-root attacks. What is most significant about this result, however, is 
that since the training time for the string transducer is orders of magnitude less 
than that of the Elman neural network, we can achieve comparable detection 
performance with significantly less training time. Where training the Elman 
nets takes on the order of thousands of minutes for all the programs monitored, 
training the string transducer and the state tester takes on the order of tens of 
minutes. 

Again, the performance against the R2L attacks was not very good for the 
same reasons. At the same false positive rate we detected about 15% of the 
remote-to-local attacks. If you raise the false positive rate to about 9 false pos- 
itives a day we detected about 35% of the remote-to-local attacks. The reasons 
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why our string transducer failed to detect many R2L attacks is the same as in 
the Elman network: most of the R2L attacks launched by Lincoln Laboratory 
did not misuse programs, or they involved malicious clients. 

4.3 Performance of State Tester 




Fig. 4. Performance of state tester on BSM data against User-to-Root (U2R) 
and remote-to-local (R2L) attacks 



The performance of the state tester is shown in Figure 4. At a rate of about 
9 false positives a day we detected 100% of the user-to-root attacks. Figure 4a 
shows a more gradual progression towards 100 percent detection, compared to 
Figure 2a, whose progression looks more like a unit step function. The upshot 
is that with the state tester, one can tune the performance of the system more 
easily to meet the acceptable detection requirements within the organization’s 
tolerance to false alarms. On the other hand, the performance of the Elman 
network indicates, more or less, all-or-nothing detection, which does not leave 
much to tune. However, it is important not to over generalize, as the results may 
vary from experiment to experiment depending on the attacks launched and the 
training data. 

At a false alarm rate of about 9 per day, we detected about 35% of the remote- 
to-local attacks. While the state tester did not perform as well as the Elman 
networks or the string transducer, its performance is still good, nonetheless, by 
existing commercial standards. The reason we believe the state tester had a 
higher false alarm rate is because the likelihood of falling off the deterministic 
automata is greater than for the string transducer or the Elman neural network. 
We believe, though, that its performance can be improved with more robust 
training data. 
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Overall, the performance of our systems on user-to-root attacks is good, 
roughly 100 percent detection at a rate of less than 10 false positives per day. 
Two of our systems, the Elman neural network and the string transducer, were 
able to detect all user-to-root attacks with fewer than four false alarms per day. 
This, combined with the fact that our systems can be trained in much less time 
than it takes to configure a rule-based intrusion detection system, makes our 
approach very promising. 

Our systems did not fare as well on remote-to-local attacks, but this was 
because many of the remote-to-local attacks Lincoln launched did not involve 
program misuse. Thus, not all such attacks are in the scope of our approach. 
Conversely, our approach will be able to detect attacks that fall in other cat- 
egories, so long as the attacks involve program misuse. Thus, the scope of our 
detection has more to do with how an attack affects program behavior then it 
has to do with other types of attributes. While we do not claim to detect all 
attacks, we do claim the scope of our detection mechanism to cover those attacks 
that misuse programs. 

5 Implementing a Real-Time Intrusion Detection Tool 

While, studying the performance of the algorithms off-line is a necessary step to 
understand the strengths and limitations of the algorithms, we felt it important 
to implement a real-time intrusion detection system that can be deployed in a 
real installation. In order to implement a real-time prototype, we performed a 
feasibility study, determined how to collect audit data in a real-time, modified 
our algorithms to work in a real-time environment, then designed and imple- 
mented a working prototype. These are described briefly in this section. 

The first task in creating a real-time intrusion detection tool was to make 
sure that our approach was actually feasible in a real-time environment. In order 
to work in real-time, the intrusion detection prototype should be able to process 
audit data that is generated by a computer under normal use, as fast, or faster 
than the data is being generated. We measured this by collecting a set of audit 
data, and then measuring how long it took us to process that data off-line. 

Our first approach was to use praudit, the built-in Solaris utility for trans- 
lating binary BSM files to a text format, to translate the collected BSM files, 
and perform simple processing on the result. We did this because our off-line 
evaluation techniques processed praudit format data, and not BSM files di- 
rectly. Next, an example of the results from real-time processing of BSM files is 
presented. 

Amount of data processed: 

— amount of BSM data: 8,195,371 bytes 

— number of events: 48,871 

— time frame that BSM data was collected over: 5 minutes 3 seconds 
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Amount of CPU time required: 

— clock time: 14 minutes 57.79 seconds 

— user cpu time: 5 minutes 20.98 seconds 

— system cpu time: 6 minutes 38.27 seconds 

As can be seen, processing this data took longer than it took the system to 
create the data. The solution to this problem was to not use pr audit, but rather 
to process the binary BSM data directly. When we did this, processing the above 
data set gave us better timing results as shown below. 

Amount of data processed: 

— amount of BSM data: 8,195,371 bytes 

— number of events: 48,871 

— time frame that BSM data was collected over: 5 minutes 3 seconds 

Amount of CPU time required: 

— clock time: 1 minutes 48.12 seconds 

— user cpu time: 0 minutes 10.99 seconds 

— system cpu time: 1 minute 36.96 seconds 

These results show that our approach is feasible to be implemented in real- 
time. 

We had to make sure our intrusion detection algorithms were amenable to a 
real-time domain. This meant three things. First, the algorithms had to run fast 
enough. Second, they had to be able to process data as it is generated, and not 
require all of the audit data at the same time. Third, the algorithm had to be 
reentrant, meaning that it had to process multiple data streams simultaneously. 

We chose the Elman networks as the first intrusion detection algorithm to 
implement in a real-time prototype. Neural networks perform recall quickly, so 
the first real-time requirement was already satisfied. The way that we use the 
Elman nets in the off-line evaluations was to process the data in order, so it 
already met the second real-time requirement as well. The third requirement was 
not met, because our implementations of the Elman Nets were in C, which meant 
that only one instance would exist at a time. This was not satisfactory because in 
a real-time environment it is possible to have multiple copies of the same program 
being run at the same time, and it is important that each execution is evaluated 
by its own neural net. To solve this problem, we modified the Elman networks 
so that they were implemented as C-I-+ objects, allowing multiple instantiations 
to exist simultaneously. This satisfied our third real-time requirement. 

Our last step was to actually design and implement a real-time prototype. 
This was a straight forward software engineering task. We designed the prototype 
such that it is modular enough to incorporate the other intrusion detection 
algorithms in a plug-and-play manner. We reviewed it to make sure that it would 
achieve our immediate goals of having something to use for internal testing and 
for ease of modification. 
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The initial prototype has now been implemented and is in testing. A demon- 
stration of the real-time prototype has been created as well. In the process of cre- 
ating the real-time prototype, we created a library called BSMart (“be smart”) 
to parse BSM data directly from the operating system in binary form in any 
number of configurable ways. Because we deem this library to be a valuable 
contribution to the ID community wishing to perform host-based intrusion de- 
tection on the Solaris platform, we are releasing the library in source code form 
to the research community. The goal is to foster research in host-based intru- 
sion detection by eliminating obstacles (such as engineering a prototype to read 
BSM data directly from the platform) for other researchers. Please contact the 
authors for more information on how to download the library. 

6 Conclusions 

Most of today’s commercial intrusion detection systems are designed only to de- 
tect known attacks. Because new attacks are discovered on a weekly and some- 
times daily basis, we feel it is imperative that approaches to detecting novel 
attacks be developed. To this end, we have developed an anomaly detection 
approach that learns normal program behavior. 

We implemented three different machine learning algorithms for the purpose 
of program-based intrusion detection: Elman artificial neural networks, a string 
transducer, and a state tester. The results from evaluating these algorithms in 
the 1999 Lincoln Laboratory/DARPA Intrusion Detection evaluation are sum- 
marized here. The results demonstrate that these techniques are very good at 
detecting user-to-root types of attacks, and program misuse attacks in general, 
with low false alarm rates. 

We have implemented a real-time prototype that implements the Elman net- 
work. The robust real-time prototype allows for swapping in and out different 
ID algorithms, including the three described in this paper. We have currently 
released the real-time BSM parser, BSMart, for other researchers in the com- 
munity. In the future, we intend to release our robust real-time prototype as 
well. 
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Abstract. Audit trail patterns generated on behalf of a Unix process 
can be used to model the process behavior. Most of the approaches pro- 
posed so far use a table of fixed-length patterns to represent the pro- 
cess model. However, variable-length patterns seem to be more naturally 
suited to model the process behavior, but they are also more difficult to 
construct. In this paper, we present a novel technique to build a table of 
variable-length patterns. This technique is based on Teiresias, an algo- 
rithm initially developed for discovering rigid patterns in unaligned bio- 
logical sequences. We evaluate the quality of our technique in a testbed 
environment, and compare it with the intrusion-detection system pro- 
posed by Forrest et al. [8], which is based on fixed-length patterns. The 
results achieved with our novel method are significantly better than those 
obtained with the original method based on fixed- length patterns. 

Keywords: Intrusion detection, Teiresias, pattern discovery, pattern 
matching, variable-length patterns, C2 audit trail, functionality verifi- 
cation tests. 



1 Introduction 

In [9] , Forrest et al. introduced a new approach to the problem of protecting com- 
puter systems. The problem is viewed as an instance of the more general problem 
of distinguishing self (i.e. normal process execution) from other (i.e. anomalous 
process execution). Based on the way natural immune systems distinguish self 
from other, Forrest et al. have developed a change-detection method that can 
be applied to virus detection [9] and intrusion detection [8] . The method models 
the way an application or service running on a machine normally behaves by 
registering characteristic subsequences, i.e. patterns, of system calls invoked. An 
intrusion is assumed to pursue abnormal paths in the executable code, and is de- 
tected when new sequences are observed that cannot be matched with registered 
patterns (see also [6, 7]). 

Forrest et al. use fixed-length patterns to represent the process model. How- 
ever, a main limitation of this approach is that there is no rationale for selecting 
the optimal pattern length. As shown in [10], the pattern length has an influence 
on the detection capabilities of the intrusion-detection system. Therefore, in [2] 



H. Debar, L. Me, and F. Wu (Eds.): RAID 2000, LNCS 1907, pp. 110-129, 2000. 
© Springer-Verlag Berlin Heidelberg 2000 



Intrusion Detection Using Variable-Length Audit Trail Patterns 



111 



the concept of using variable-length patterns to model the process behavior was 
introduced. However, preliminary results obtained with variable-length patterns 
revealed no clear advantage of that method. In this paper, we present a novel 
method to generate variable-length patterns. We can show that the results ob- 
tained with variable-length patterns clearly outperform those achieved with the 
original method, which is based on fixed-length patterns. 

The structure of the paper is as follows. Section 2 describes the basic prin- 
ciples of detecting suspicious process behavior by analyzing the sequences of 
system calls a process can generate. Readers familiar with the previous work on 
this topic [2, 3, 8, 10, 11, 13, 14, 15] can skip this section and go directly to Sec- 
tion 3 where our novel intrusion-detection method, which uses variable-length 
patterns, is presented. Section 4 compares our novel method with the one pro- 
posed by Forrest et al. [8, 10] based on experiments performed in a testbed [5] 
environment. Section 5 concludes the paper by summarizing the results obtained 
and offering ideas for future work. In the Appendix, formal descriptions of the 
variable-length pattern-extraction and the variable-length pattern-matching al- 
gorithm are given. 



2 Background 

We describe the basic principles of intrusion-detection systems that use char- 
acteristic subsequences of system call traces to model the process behavior and 
to detect intrusions by looking for deviations from the process model. First, we 
show the generic architecture of such intrusion-detection systems. Then we de- 
scribe in more detail the intrusion-detection system proposed by Forrest et al. 
[8, 10], which will be used as the reference system to evaluate the quality of our 
novel approach. 

2.1 Architecture 

The intrusion-detection system proposed by Forrest et al. [8, 10] is a behavior- 
based [4] intrusion-detection system. In a training phase, normal process behav- 
ior is defined. During real-time operation, it is decided whether the observed 
process behavior corresponds to the learned normal behavior, or whether signif- 
icant deviations are observed, which may be an indication of an intrusion. 

There are different interpretations of what the expression “normal” behavior 
means. In [10] the authors differentiate between synthetic normal and real normal 
behavior. Synthetic normal behavior is created by exercising a program in an 
isolated environment in as many modes as possible and recording its behavior. 
Real normal behavior is observed by tracing the behavior of a program in a live 
user environment. For a discussion of the advantages and disadvantages of each 
approach see [2]. 

Forrest et al. [8, 10] are mainly interested in real normal behavior because 
this allows them to detect abnormal but legitimate behavior, i.e., behavior that 
is valid according to the process specification but has not been seen during the 
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training phase. In our work, we concentrate on synthetic normal behavior and 
furthermore try to learn the normal process behavior exhaustively. We achieve 
this by using functionality verification test suites (FVT) that systematically 
exercise all valid process invocations. Our objective is to detect attacks against 
the process itself, i.e. attacks that succeed in exercising process execution paths 
that were hitherto unknown and do not correspond to the process specification. 
However, it is important to note that the intrusion-detection technique itself, 
specifically whether to use fixed- or variable-length patterns, does not depend 
on the method used to learn normal behavior. 

The architecture of our intrusion-detection system is depicted in Figure 1. 
The system comprises two main parts: an off-line part, which corresponds to the 
training system, and an on-line part, which corresponds to the detection system. 
The main components of each part are described in the next two subsections. 




i t 



OFF-LINE 


Translation 




Pattern 


ON-LINE 


Table 




Table 




Fig. 1. Intrusion-detection system 



Training System The behavior of the process under study is traced by record- 
ing either the system calls or the audit events generated on behalf of the process. 
In [8], system calls are used. Although on most operating systems not every sys- 
tem call is represented as an audit event, it has been shown in [2] that audit 
events are a viable alternative offering the same detection capabilities. For our 
work we use audit events because, as our experiments have shown, collecting 
audit events is a less intrusive technique than recording system calls. 

The audit events generated on behalf of different process executions are sent 
to a filtering module. Its task is to sort the events by process id while keeping 
the chronological event order. The events are given as tuples comprising the 
process and event name. For easier processing, the translation module translates 
the events into an internal format. We use characters to represent this internal 
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format. The translation rules are generated on the fly and stored in a translation 
table. Figure 2 shows the translation steps from the stream of audit events to 
the sequences of characters. 



PID 


CMD 


EVENT 


USER 


16415 


ftpd 


FILE_ 


Open 


root 


16415 


ftpd 


FILE_ 


Open 


root 


18210 


f ingerd 


FILE_ 


Read 


root 


16415 


ftpd 


FILE_ 


Read 


root 


18303 


ftpd 


PROC_ 


Create 


root 


16415 


ftpd 


FILE_ 


Close 


root 


18303 


ftpd 


FILE_ 


Close 


root 


18303 


ftpd 


FILE_ 


Close 


root 



Translation table 



ftpd/FILE_Open 

ftpd/PILE_Read 

ftpd/PILE_Close 

ftpd/Proc_Create 



16415 f tpd/FILE_Open 
f tpd/FILE_Open 
^ f tpd/FILE_Read 

f tpd/FILE_Close 



AABC. 



18303 ftpd/PROC_Create 
f tpd/FILE_Close 
f tpd/FILE_Close 



DCC. . . 



(a) Extract from audit trail 
(some fields omitted) 



(b) Audit events sorted 
by process ID 



(c) Sequences of 
characters 



Fig. 2. Translation of audit events to characters 



The translated sequences are forwarded to the aggregation and reduction 
module. The purpose of this module is twofold: 

— It aggregates consecutive occurrences of the same character, i.e., of the same 
event. 

— It removes duplicate sequences. 

The following example shows the usefulness of the aggregation of consecutive 
identical characters. When a new instance of the ftpd process is invoked, it 
inherits several file handles from inetd, its parent process. As one of its first 
tasks, the ftp daemon closes the file handles, resulting in consecutive FILE_close 
events. The number of inherited file handles may vary because the inetd process 
is not always in the same state. Closing all unneeded file handles will therefore 
result in a varying number of FILE_close events. As a consequence, the resulting 
sequence of system calls is dependent on the environment in which the process 
runs. Because we would like to have a process description that is independent of 
the environment, we aggregate consecutive occurrences of the same character. 

There are different ways to do the aggregation. We follow the approach pro- 
posed in [2] and aggregate identical consecutive characters in a single character, 
i.e., A = A+ in regular expression formalism. We make no claim of equivalence 
between the simplified event sequence and the original one. The aggregation is 
an experimental choice and would be removed if any negative impact on the 
detection capabilities of the intrusion-detection system were observed. 
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During the training phase, duplicate event sequences may occur. Because 
they contain no new patterns, duplicate event sequences do not have to be con- 
sidered and are hence removed. 

After all process executions have taken place, the preprocessed sequences are 
forwarded to the pattern- extraction module where the pattern table is generated. 



Detection System The structure of the detection system is similar to that of 
the training system. In the detection system, events generated on behalf of the 
process under study are collected and processed in real time. The filtering module 
is identical to that of the training system. The translation module is slightly 
different from its counterpart in that audit events are translated based on the 
entries already contained in the translation table. Events without a corresponding 
entry in the translation table constitute quite an unusual event because they 
have not been seen in the training phase of the process. They are translated into 
a dummy character, and one may consider issuing an alarm whenever such a 
character is observed. In the current implementation of our intrusion-detection 
system, they are treated the same way as unmatched characters. 

The reduction component of the reduction and aggregation module is no 
longer needed. Pattern matching is done in real time, and initiated as soon 
as possible for each sequence. This means we do not wait until the complete 
sequence has been received before the pattern matching is started, and therefore 
the reduction of entire sequences is not applicable. 

The task of the pattern-matching module is to match the arriving event se- 
quences with the entries in the pattern table. Based on how well the pattern 
matching can be done, it is decided whether anomalous behavior is observed 
and thus an alarm has to be raised. 

2.2 A Review of Forrest et aZ.’s Approach 

This description of Forrest et al.’s work is based on the original paper [8] as well 
as a more recent publication [10] in which some modifications of the original 
concepts are described. We show the techniques applied for pattern extraction 
and pattern matching as well as the metrics used to differentiate between normal 
and abnormal behavior. 



Pattern Extraction The algorithm to build the table of fixed-length patterns 
is very simple. From the sequences sent to the pattern-extraction module, all 
unique subsequences, i.e. patterns, of a given length k are extracted. This is 
achieved by sliding a window of length k across all input sequences and recording 
the encountered subsequences. Duplicates are not considered. 

The construction of the pattern table is best illustrated with an example. 
For fc = 3 and the sample training sequence ABCCABC, we obtain the following 
pattern table: 



{ ABC, BCC, CCA, CAB }. 
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Note that the pattern ABC shows up only once in the pattern table although it 
is encountered at two window positions, namely the first and the last position. 

Pattern Matching The pattern-matching technique is similar to the pattern- 
generation technique. We move a window of length k across the sequence that 
is recorded during real operation. Each window position is checked for a match, 
i.e., whether there is a pattern that matches the subsequence in the window. If 
no matching pattern exists, we speak of a mismatch. 

Given the pattern table of the previous example and the sample sequence 
ABCCACC, we observe three matches, namely {ABC, BCC, CCA}, and two mis- 
matches, namely jCAC, ACC}. 

Metric Note that the measure for raising an alarm must not depend on the 
sequence length. Arriving events have to be processed in real time, and we do 
not want to wait until all events of a process have arrived before we check 
them for possible signs of intrusions. This would be problematic, for example, 
in cases of continuously running processes. In [10], three measures are given to 
differentiate between normal and abnormal behavior. However, only the measure 
we are going to describe in this section is independent of the sequence length. 

Let a and b be two sequences of length k. The expression Oi designates the 
character at position i. The difference d{a,b) between a and b is defined as 



During pattern-matching, we determine for each subsequence u of the translated 
event sequence the minimum distance dmin(M) between u and the entries in the 
pattern table: 



To detect an attack, at least one of the subsequences generated by the attack 
must be classified as anomalous. In terms of the above measure, there is at least 
one subsequence u for which 



It is assumed that the higher the dmin value, the more likely it is that the sub- 
sequence was actually generated by an intrusion. In practice, the maximum dmin 
value observed is used as the measure for an intrusion because it represents the 
strongest anomalous signal. The signal of anomaly, Sa, is defined as 



In the ideal case, an Sa value that is greater than 0 can be considered a sign 
of an intrusion. However, as experimental results show, a complete match can- 
not always be achieved [10]. Therefore, a threshold is defined such that only 
sequences whose Sa value is above this threshold are considered suspicious. 




k 



dmin{u) = min {d{u,p) V patterns p} . 



dmin(^) ^ 0. 



Sa = max{drnin(M) V subsequences u} . 
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3 Variable-Length Patterns 

Before building a table of fixed-length patterns, one has to decide which pattern 
length to use. However, selecting the most appropriate pattern length is not 
straightforward: 

— Long patterns are expected to be more process-specific than short patterns. 
The longer a pattern, the lower the probability that a pattern would match 
part of an event sequence generated on behalf of an attack. 

— It is desirable to have a small pattern table because it reduces the amount of 
computation needed for the detection process. As experimental results show, 
increasing the pattern length to a certain length also increases the size of 
the corresponding pattern table [2]. 

Using variable-length patterns enables us to cope with these two apparently 
contradictory constraints. To describe the normal behavior of a process, variable- 
length patterns appear to be more naturally suitable than fixed- length patterns. 
A careful look at the sequences of events that can be generated by a process 
shows that there are many cases in which very long subsequences are repeated 
frequently. For example, more than 50% of the process images we have obtained 
for the ftpd process start with the same string. After aggregation, this string 
contains 40 audit events and should be incorporated as a whole in the pattern 
table. However, approaches based on fixed-length patterns use much shorter 
pattern lengths and would therefore not detect such a long pattern. 

Variable-length patterns are also motivated by the fact that, for example, the 
ftp daemon answers user commands, and that each such command can probably 
be represented by its own sequence of audit events. 

Variable-length patterns are not as easy to generate as fixed-length patterns. 
A technique based on building and pruning suffix trees [2] showed that variable- 
length patterns are an interesting alternative to fixed-length patterns, but it also 
showed some limitations of the chosen pattern-generation technique. 

3.1 Pattern Extraction 

We present a novel method to generate the table of variable-length patterns. 
This method comprises two steps. In the first step, all maximal variable-length 
patterns contained in the set of training sequences are determined. Because the 
patterns can share common subsequences, not all patterns may be needed to 
cover, i.e. fully match, the training sequences. Therefore, in the second step, a 
reduction algorithm is applied to prune entries in the pattern table. The goal is 
to obtain the minimum pattern set that still covers all training sequences. 



Generating the Pattern Set The input to the pattern-extraction module (see 
Fig. 1) are sequences of audit events that have been preprocessed as described 
in Section 2.1. We define a variable-length pattern as a subsequence that has a 
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minimum length of two and occurs at least twice, be it in the same or in different 
sequences. Furthermore, we consider only maximal variable-length patterns. A 
pattern p is maximal if there is no other pattern q that contains the pattern 
p as a subsequence and has the same number of occurrences as pattern p. For 
example, if there are two patterns DEA and EA, pattern EA is considered maximal 
only if it occurs more often than pattern DEA. 

There are several algorithms to determine variable-length patterns [1]. We 
use the Teiresias algorithm [12], an algorithm developed initially to discover 
rigid patterns in unaligned biological sequences. Teiresias has many interesting 
properties. It is well suited to our problem for the following two main reasons: 

— It finds the maximal variable-length patterns by avoiding the generation 
of non-maximal intermediate patterns during the pattern-extraction pro- 
cess [12]. 

— Its performance scales quasilinearly with the size of the output [12]. 

It follows that Teiresias very efficiently finds all the maximal variable-length 
patterns in the set of training sequences. 



Reducing the Pattern Set We want the pattern set to be as process-specific as 
possible. This means that the pattern set should contain all the patterns needed 
to cover the training sequences but not more. The set of maximal variable- 
length patterns usually contains overlapping patterns, i.e. patterns that have 
common subsequences. Let us have a look at the following sample set of training 
sequences: 



{ ABCDEAFDE, BCFDEABCD, BCEADEFDE }. 

Extracting the maximal variable-length patterns results in the following pattern 
table: 



{ ABCD, DEA, FDE, BC, DE, EA }. 

The question arises whether all patterns are needed to cover the training se- 
quences. Let us decompose the training sequences such that the resulting sub- 
sequences correspond to entries in the pattern table. A possible decomposition 
of the training sequences is listed below. We use the symbol to mark the 
decomposition points: 

{ ABCD-EA-FDE, BC-FDE-ABCD, BC-EA-DE-FDE }. 

As we can see, of the six patterns in the pattern table only five are needed in 
the above decomposition. The pattern DEA is not used. We conclude that the 
pattern set determined by Teiresias can be reduced. 

There are various ways to construct the reduced pattern set. The rationales 
for the approach described in the remainder of this section are based on the 
observation that there are patterns that have a clear semantical representation. 
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A pattern may, for example, represent a subroutine that is invoked several times 
or the statements that are executed in a loop. Such patterns can be regarded 
as building blocks out of which the event sequences of any possible process 
instantiation can be composed. 

In our experiments, we observed that many training sequences have the same 
beginning and end, i.e., the same initialization and termination routine is exe- 
cuted for different process instantiations. As a first step, we can add the cor- 
responding pattern to the reduced pattern set. Subsequences that match this 
pattern are removed from the training sequences, and the reduction process con- 
tinues with the pruned training sequences. This procedure is reiterated until no 
training sequences are left, i.e., until all training sequences can be covered with 
the patterns added to the reduced pattern set. 

There is a single requirement that must be fulfilled by the reduced pattern 
set: 



— The training sequences must be covered by the patterns in the reduced pat- 
tern set. 

In addition, as explained at the beginning of Section 3, the following properties 
are desirable: 

~ The reduced pattern set should contain long patterns. 

— The number of patterns in the reduced pattern set should be small. 

The two inputs for the reduction algorithm are the pattern table as produced 
by the Teiresias algorithm and the set of training sequences. The algorithm itself 
comprises four steps, which are executed repeatedly until all training sequences 
have been processed. We outline here only the basic steps of the algorithm. A 
detailed description can be found in Appendix A. 2. 

Step 1 

The function bCover(p, s) returns the number of characters covered at the 
beginning and at the end of a sequence s by a pattern p. bCover(p, s) considers 
the fact that a pattern may match several times at the beginning or end of a 
sequence, e.g. 



bCover(AB, ABCDEABAB) = 6. 

If S designates a set of sequences, bCover(p, S) is the sum of all events 
matched at the beginning and end of all sequences by the pattern p. We call 
the returned value the boundary coverage. 

For each entry in the pattern table, we calculate its boundary coverage of 
the set of training sequences. The pattern with the highest boundary coverage is 
added to the reduced pattern set. This pattern is used further in Steps 2 and 3. 

Step 2 

All the subsequences at the beginning and end of the training sequences that 
are matched by the pattern determined in Step 1 are removed. For example, if 
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the pattern AB is selected in Step 1, the sequence ABABCDAB will be transformed 
as follows: 



ABABCDAB -> CD. 

Furthermore, we have to avoid training sequences being reduced to sequences 
that are shorter than the minimal pattern length. By definition, there would 
be no pattern to match such a short sequence. For example, if the minimal 
pattern length is two and ABC is an entry in the pattern table, the following 
transformation of the sequence ABCD is invalid: 

ABCD D. 

because the remaining sequence D is shorter than the minimum pattern length. 

Step 3 

After removing the matching subsequences at the boundary of the training 
sequences, we now also remove matching subsequences p that are not adjacent 
to the boundary. We call this process nonboimdary matching. Removing such 
subsequences results in splitting the original sequence into two new sequences. 
As in the case of boundary matching, it has to be ensured that the length of the 
resulting sequences is equal to or greater than the minimum pattern length. If a 
sequence has several subsequences that can be matched, the longest subsequence 
is removed first. Nonboundary matching may again be applied to the resulting 
sequences. For example, given the pattern AB and a maximal pattern length of 
two, the following transformation can be applied to the sequence CDABABEFABGH: 

CDABABEFABGH ^ { CD, EFABGH } ^ { CD, EF, GH }. 



Step 4 

No further transformation can be applied to sequences whose length is less 
than two times the minimal pattern length. Any further transformation would 
result in a new sequence that is less than the minimal pattern length, which 
contradicts our requirements. As a result, any sequence that cannot be further 
reduced will be added to the reduced pattern set. However, they are first moved 
to the pattern table and treated the same way as the patterns determined by the 
Teiresias algorithm. Note that, as a consequence, the reduced pattern set may 
contain entries that were not determined by Teiresias. 

If after execution of Step 4 no sequences remain in the training set, the 
reduction algorithm terminates, otherwise execution continues at Step 1. 

Figure 3 illustrates how the algorithm works for a sample training set of three 
sequences. In this example, five steps are needed to derive the reduced pattern 
table. For each step, we show the state of the training sequences, the entries 
in the pattern table, their corresponding bCover values, and the state of the 
reduced pattern table. 
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Training Pattern bCover Reduced 





To show the importance of the reduction algorithm, let us take a look at the 
following numbers. The training sequences of the experiment that we are going 
to describe in Section 4 contained a total of 167,187 patterns. Of this total, 
554 patterns are maximal. These are the patterns that we generate using the 
Teiresias algorithm. It becomes obvious that generating the maximal patterns 
directly as Teiresias does offers a significant advantage over other approaches 
that also generate the intermediate patterns. Of the 554 maximal variable-length 
patterns, a pattern set of only 71 patterns can be constructed that covers all 
the training sequences. This shows the usefulness of reducing the pattern sets 
generated by Teiresias. A pattern-matching process that has to consider only 71 
patterns will run faster than one that has to consider 554 entries. The statistics 
for the variable-length patterns are summarized in Table 1. 
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Table 1. Example of table sizes 


of variable-length patterns 


Patterns 


167,187 


Maximal patterns 


554 


Covering patterns 


71 



3.2 Pattern Matching 

As stated in the previous section, variable-length patterns can be seen as building 
blocks out of which any valid event sequence can be constructed. This idea is also 
reflected in the juxtaposed pattern-matching technique we apply for variable- 
length patterns. 

The sequence to be matched is processed starting from the beginning of the 
sequence to its end. One out of three conditions holds at a given point of the 
pattern-matching process. 

1. Exactly one pattern matches at a given position. The corresponding events 
are marked as matched and the pattern matching continues right after the 
last event marked. 

2. Several patterns match at a given position. To decide which of the matching 
patterns to select, a look-ahead algorithm determines for a predefined value 
of n whether a sequence of up to n patterns can be found that matches the 
continuation of the sequence. The pattern whose continuation results in the 
longest match is selected, the corresponding events are marked as matched, 
and the pattern matching continues right after the last event marked. 

3. No matching pattern can be found. The event at the current position of the 
pattern-matching process is marked as unmatched and skipped. The pattern 
matching continues right after the skipped event. 

A detailed description of the pattern-matching algorithm can be found in 
Appendix A. 3. 

3.3 Metric 

For each sequence, the pattern-matching algorithm returns the g groups of con- 
secutive uncovered events and the length k, i = 1 . . .g, oi each of these groups. 
It is assumed that the greater the length /^, the more likely it is that an intrusion 
is observed. Based on the length of the longest group of uncovered events, T, it 
has to be decided whether an attack is observed. T is defined as follows: 

T = max(/i), z = 1 . . .g. 



4 Results 

We have set up a test environment [5] to evaluate the quality of Forrest et al. ’s 
intrusion-detection method and our novel method, which is based on variable- 
length patterns. We report the results obtained for the ftpd process. We focus 
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on this process because it is widely used and is known to contain many vulner- 
abilities (either due to software flaws or configuration errors). Furthermore, it 
provides a host of possibilities for user interaction and is therefore a challeng- 
ing process from an intrusion-detection point of view. We have also successfully 
applied our intrusion-detection approach to other network services, e.g. Anger 
and sendmail. For space reasons, only the results obtained for the ftp service are 
presented in this paper. 

To train the system, we use the functionality verification test suites (FVT) 
running under AIX [3] . The test suite allows us to automatically exercise all ftp 
subcommands and thus to learn the complete process behavior. 

4.1 Problem Size 

The FVT for the ftp process consists of 487 individual tests. Because many of 
these tests do not differ in the subcommands invoked but only in the arguments 
used, they result in identical sequences of audit events. When running the ftp 
test suite, 68 unique sequences (after aggregation and reduction) were recorded 
comprising a total of 23,302 audit events. Table 2 summarizes these numbers. 



Table 2. Problem size of the ftp experiment 



Tests 


487 


Training sequences 


68 


Events 


23,302 



For the comparison of the fixed- and variable-length approaches, we use two 
tables of fixed-length patterns and one of variable-length patterns. The pattern 
sizes of the fixed-length pattern tables are six and ten, respectively. Six was 
selected because it is stated in [10] that the pattern size makes only little dif- 
ference for the normalized signal of anomaly, i.e. once we have a length 

of at least six, and ten because this is the pattern size used in the experiments 
reported in [10]. It is worth noting that, coincidentally, the mean pattern length 
of the variable-length pattern table is ten. Table 3 lists the size of the respective 
pattern tables. We see that the size of the variable-length pattern table is much 
smaller than that of the fixed-length pattern tables. 

4.2 Normal User Sessions 

In our testbed [5], we simulated series of user sessions. The simulation resulted 
in 65 unique sequences comprising a total of 26,025 audit events. We used these 
sequences to evaluate the quality of the fixed-length and variable-length pattern- 
matching techniques in combination with the respective pattern tables. As we 
have used the FVT to learn the complete process behavior and the user sessions 
contain no attacks, the intrusion-detection system should not generate an alarm. 
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Table 3. Table 


sizes of fixed-length 


patterns 


Pattern type 


(Mean) pattern size 


Table size 


Fixed- length 


6 


396 


Fixed- length 


10 


702 


Variable- length 


10 


71 



Table 4 shows the results obtained. The first column lists the number of 
unique sequences recorded. The second column specifies a value n that is used 
as a comparison value for columns three to five. Columns three and four give a 
measure of how well the normal user sessions could be matched with the entries 
of the respective fixed-length pattern tables, and column five does the same for 
the variable-length pattern table. 

To understand the content of columns three to five, we have to recall the 
meaning of the two metrics Sa and T. The metric Sa is the signal of anomaly 
defined in Section 2.2 and is used to differentiate between normal and abnor- 
mal behavior when fixed-length patterns are used. The values of Sa lie between 
0 and k, where k is the pattern size. The higher the value of Sa, the more 
likely it is that an intrusion is observed. The metric T is the number of consec- 
utive uncovered characters defined in Section 3.3 and is the metric used in our 
intrusion-detection system that is based on variable-length patterns. 

The entries in columns three to four list the number of sequences for which Sa 
is equal to the comparison value n of the same row. For example, the row with 
n = 4 indicates that for a window size of six (ten), we have observed a maximum 
of four uncovered characters in all subsequences of six (ten) characters. This has 
been seen in five (eight) out of 65 sequences. 

The last column lists the results obtained for the variable-length approach. 
Here, the row with n = 4 indicates that there are two sequences out of 65 where 
the maximum number of consecutive uncovered characters is 4. 



Table 4. Experimental results for normal user sessions 



Number of 
sequences 


n 


Fixed 
fc = 6 


Fixed 
fc = 10 


Variable 


II 


Sa = n 


T = n 


65 


0 


11 


11 


47 




1 


19 


0 


14 




2 


19 


25 


1 




3 


11 


15 


1 




4 


5 


8 


2 




5 


0 


4 


0 




6 


0 


2 


0 




> 6 


- 


0 


0 



124 Andreas Wespi et al. 



In the ideal case, we would like to see 5 a = 0 and T = 0, i.e. full coverage 
of all test sequences. However, Table 4 shows that with the two fixed- length 
approaches only 17% of the sequences, i.e. 11 out of 65, can be fully matched. 
With the variable-length pattern approach, 72% of the sequences, i.e. 47 out 
of 65, can be covered. We see that variable-length patterns result in much a 
better coverage of the test sequences than fixed-length patterns. 

Table 4 also allows us to set thresholds to differentiate between normal and 
abnormal behavior. Any value of Sa or T that is above the threshold would be 
considered a sign of an intrusion. If we do not want to issue false alarms, the 
threshold for Sa has to be set to four (six) in the case of fixed-length patterns. 
In the case of variable-length patterns, the threshold of T has to be set to four. 

4.3 Attacks 

We have implemented seven attacks against the ftp service. Some of the attacks 
exploit server misconfigurations, others take advantage of vulnerabilities in older 
versions of the ftp daemon. 

The put forward attack consists of putting a .forward file in the home direc- 
tory of the ftp user, and then sending a mail to the ftp user. This vulnerability 
results from a misconfiguration of the ftp service because this directory should 
obviously not be world writable. 

The site exec suite of attacks exploits a vulnerability that was enabled by 
wrongly setting the _PATH_EXECPATH variable when compiling the ftpd program. 
Precompiled binaries containing this vulnerability were shipped with an older 
release of the Linux Slackware distribution. Two different attack scripts were 
executed, ftpbug and copy. To make the two scripts difficult to detect, they were 
given the names ftpd and Is (hence the code names of the attacks) . 

The tar exec type of attacks use the option of the GNU tar program to 
specify a compression program in combination with the tar command. The attack 
becomes possible because some older versions of the ftp daemon do not release 
their root privileges quickly enough before forking other processes. To exploit 
this vulnerability, we let the tar program invoke renamed copies of the ftpbug 
and copy program as compression programs. 

A detailed description of the attacks can be found in [5]. The results we 
obtained are shown in Table 5. 

We see that all three approaches can be used to detect the attacks. For the 
two fixed-length pattern tables, Sa is equal to the maximum value of six (ten) 
for all attacks. All the values obtained are above the threshold we have defined 
for Sa. For the variable-length method, the values for T vary between 13 and 
36. These values are significantly higher than the threshold we have set for T, 
namely four. 

4.4 Discussion 

The quality of an intrusion-detection method is given by its capability to dif- 
ferentiate between normal and abnormal behavior. For the intrusion-detection 
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Table 5. Experimental results for attacks 



Attack 

description 


Fixed 
k = 6 


Fixed 
fc = 10 


Variable 




Sa 


Sa 


T 


put forward 


6 


10 


36 


site copy 


6 


10 


18 


site exec copy ftpd 


6 


10 


16 


site exec copy Is 


6 


10 


18 


site exec ftpbug ftpd 


6 


10 


16 


tar exec ftpbug ftpd 


6 


10 


13 


tar exec ftpbug Is 


6 


10 


14 



methods we investigate, the differentiator is the threshold that has to be set 
for the measures 5”^ and T. In the case of fixed-length patterns, the threshold 
for S'a, i.e. four or six, is relatively high compared to the pattern length of six or 
ten, and implies an increased risk to miss an attack. In the case of variable-length 
patterns, we observe quite a difference between the threshold for T, i.e. four, and 
the minimum value of T obtained for the attack sequences, namely 13. There- 
fore, the risk of issuing a false alarm is quite low if variable-length patterns are 
used. 

We conclude that intrusion-detection methods based on variable-length pat- 
terns can be more reliably used to differentiate between normal and abnormal 
behavior. This is mainly because variable-length patterns better match normal 
user sessions. 

5 Conclusions 

We have presented a host-based intrusion-detection system that can model the 
normal process behavior based on the audit sequences created on behalf of the 
process. The process model is a pattern table whose entries are subsequences of 
the audit event sequences determined during a training phase. 

Because the fixed-length pattern approach has certain limitations, including 
the inability to represent long, meaningful substrings, it appears to be more 
natural to use variable-length patterns to build the process model. We have 
developed a novel technique to generate tables of variable-length patterns auto- 
matically. To construct the patterns, the Teiresias algorithm, a method initially 
developed for discovering rigid patterns in unaligned biological sequences, is used 
in combination with a pattern-reduction algorithm. 

We have shown that the variable-length pattern model has several advantages 
over the fixed-length model. Fewer patterns are needed to describe the normal 
process behavior, and the quality of the results achieved is significantly better 
than that of the results obtained with fixed-length patterns. Our results also show 
that behavior-based intrusion-detection systems can be built that do not suffer 
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from one of the main problems observed in behavior-based intrusion detection, 
namely generating (too) many false alarms. 

Future work will concentrate on validating our approach for other network 
services and on investigating techniques that would result in 100% coverage of 
normal user sessions. Furthermore, as our technique to build variable-length 
pattern tables has some similarities with techniques used for data compression, 
we plan to investigate the potential of this technology for intrusion-detection 
purposes. 

A Algorithms 

The pattern-reduction and pattern-matching algorithms have been briefly de- 
scribed in Sections 3.1 and 3.2, respectively. Here, we describe them in more 
detail. 

A.l Terminology and Notation 

Consider a finite set of characters S = ci, C2, . . . , c„. The set E is called al- 
phabet. To denote a string of n,n > 0, consecutive identical characters c G E, 
we write c". denotes a string of identical consecutive characters of arbitrary 
length I, I > 0. 

The length of a string s is written as |s|. We write c G s if the character c is 
contained in the string s. 

Given is a set of strings S = {si, S2, ..., s™} over the alphabet E. A substring p 
that 

~ occurs at least twice in the set of strings S, and 
~ has a length \p\ of two or more characters 

is called a pattern. 

p” denotes the pattern p repeated n,n > 0 times. p+ denotes the pattern p 
repeated l,l > 0 times. 

A pattern p is maximal if there is no pattern q for which holds that 

— p is a substring of q with |p| < |q|, and 

— the number of occurrences of the pattern 5 in 5" is equal to or larger than 
the number of occurrences of the pattern p in S. 

A character c G s is said to be covered by the pattern p if c G p and p is a 
substring of s. 

A string s is said to be covered by a set of patterns P if for each character c, 
c G s, there is a pattern p,p G P, such that c is covered by p. 
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A set of strings S is said to be covered by a set of patterns P if each string 
s,s G S, is covered by P. Additionally, P is said to cover S. 

Given are a pattern p and a string s. Let us decompose the string s as follows: 

s = p^s'p^ /,r->=0, |s'| >= 0 

It is assumed that the decomposition is maximal, i.e., there is no I' and r' for 
which holds I' + P > I + r. 

The expression (l + r)- |p|, i.e. the sum l + r times the pattern length \p\, is called 
boundary coverage of pattern p and string s. It is written as bCover(p, s'). 

The boundary coverage of a pattern p and a string set ^ = si, S 2 j ■ • ■ j Sn, written 
as bCover(p, S'), is defined as 



n 

bCover(p, S) = ^ bCover(p, Si). 

i=l 



A. 2 Pattern Reduction 

Out of the set of patterns P consisting of all the maximal patterns found for 
the string set S, a subset of patterns R,RcP, is selected that covers S. p 
denotes the minimal pattern length that was used to generate the set of maximal 
variable-length patterns. The reduced pattern set R is constructed as follows: 

1. If P = 0, then add all s S S to the reduced pattern set R and exit. 

2. For each p G P calculate bCover(p, S). 

3. Select a pattern r G P for which bCover(r, 5) is maximal, i.e., there is no 
other pattern p G P for which holds: 

“ bCover(p, S') > bCover(r, S), or 
— bCover(p, S) = bCover(r, S) A|p| > |r|. 

4. Add r to the reduced pattern set R and remove it from P. 

5. Remove all matching substrings adjacent to the beginning or end of a string, 
i.e., remove strings of the form s = r’*', and replace strings of the form 
s = r+s', I s' I > p, or s = s"r-+, |s"| > p, with s' or s", respectively. 

6. Remove the matching substrings that are not adjacent to the beginning or 
end of a string, i.e., as long as there is an s G S, s = s'rs" , |s'| > /r, |s"| > /r, 
replace s with the two strings s' and s". 

7. If there is an s € S with length |s| < 2- p, remove s from the set of strings S 
and add it to the pattern set P. 

8. If S' yf 0, go to Step I, otherwise exit. 
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A. 3 Pattern Matching 

At certain points of the pattern-matching process, there may be several patterns 
that match the input stream. To decide which pattern to select, the algorithm 
uses a look-ahead approach. A pattern is selected if (5, 5 > 0, patterns can be 
found that match the continuation of the string. We designate S as look-ahead 
parameter. An alarm is raised if the number of consecutive uncovered characters 
exceeds a threshold r. 

The pattern matching is done as follows: 

1. Set the look-ahead parameter to a value 5 > 0, and set the threshold for the 
number of consecutive uncovered characters to a value r > 0. 

2. Set the counter of consecutive uncovered characters, /t, to 0. 

3. When there is a sufficient number of characters in the input stream /, find a 
pattern p € P that covers the beginning of the input stream I. If no pattern 
can be found, go to Step 6. 

4. Find 5 > 0 patterns gi , (j 2 , ■ • ■ ,qs, such that the string t = pqiq 2 -..qd covers 
the beginning of the stream. If there are e patterns qi,q 2 , 0 < e < <5, 
that cover the entire input sequence, set t = pqiq 2 ...qe- 

(a) If t matches the entire input sequence, remove it and go to Step 2. 

(b) If S patterns can be found that cover the beginning of the input stream, 
remove pattern p from the input stream, and go to Step 2. 

5. Determine all pattern combinations that match the beginning of the input 
stream. If there is a match, select the pattern combination that covers the 
longest input sequence, remove it from the input stream, and go to Step 2. 

6. Skip one character, and increase k by 1. 

7. If /t = T -|- 1, raise an alarm. 

8. Go to Step 2. 
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Abstract. This paper presents an approach of the intrusion detection 
problem applied to CORBA-type distribnted environments. The ap- 
proach is based on the measure of deviation from client reference be- 
haviors towards the CORBA servant objects to be protected. We con- 
sider a client behavior as a sequence of invoked requests between each 
couple of client-server, during each connection of the observed client. 
We construct, during a training period, a client behavior model based 
on variable-length branches tree representation. This model both takes 
into account the series of invoked requests and their parameter values. To 
make our approach more flexible, we construct, at the end of the training 
period, a tolerance interval for each numerical parameter. These inter- 
vals allow deviation between observed and learned values to be measured. 
This article presents our preliminary results and introduces our future 
works. 



1 Introduction 

CORBA (Common Object Request Broker Architecture) is a distributed archi- 
tecture which enables heterogeneous objects to freely communicate regardless of 
the hardware, OS and programming languages of the interacting objects, thanks 
to a software bus called ORB (Object Request Broker). In spite of the preventive 
security mechanisms in CORBA (authentification, authorisation, etc), it may be 
still possible to take advantage of ORB vulnerabilities to perform attacks^ It 
is therefore necessary to make use of an intrusion detection mechanism, which 
implies a permanent surveillance of the exchanges between objects to make sure 
of their legitimity. 

There are two approaches in intrusion detection: misuse detection and 
anomaly detection. Misuse detection searches for known attacks in the event 

* This work is partly funded by The France Telecom R&D Center. We would like to 
thank especially Anne Lille, Eric Malville, and Michel Milhan for many interesting 
discussions. 

^ With respect to the threats and attacks, we don’t know of any published work. 
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logs. It implies preliminary knowledge of CORBA attacks which is hard to con- 
stitute a priori. Then, we decided on the anomaly approach which models the 
behaviors of CORBA clients involved in a communication, in order to detect fur- 
ther deviation from reference behaviors called “normal” behaviors. To construct 
our behavior base from the information collected, we need to decide on the data 
used to characterize a normal behavior. In our approach, invoked requests during 
each CORBA client connection^ are considered as the most relevant data for a 
client behavior definition. We think that the obtained variable-length sequences 
of invoked requests express accurately the real client behavior towards involved 
CORBA objects[l]. Currently, we only consider the requests order in a sequence; 
the frequency of request invocation is not taken into account. 

Our work focuses on intrusion detection at the application level. In fact, 
detection is based on application audit source generated by message interceptors. 
These interceptors, provided by CORBA environment, are able to spot, modify 
and redirect every message passing through the ORB. We used these mechanisms 
to collect information about observed objects in order to construct event logs. 
The logs analysis allows intrusion detection. 

Some works, in the context of intrusion detection in CORBA-type distributed 
environments, were conducted by Odyssey Research Associates [3]. The approach 
considers exchanged messages between clients as the discriminant data for the 
definition of a normal behavior. They only consider the signatures of the inter- 
cepted client requests. The parameters are not taken into account. Successive 
observed signatures allow the construction of behavior patterns (a pattern con- 
sists of a series of fixed-length consecutive calls organized into a behavior base) . 
During the real activity of the observed client, the detection algorithm (called 
sliding window algorithm) can determine whether the observed behavior is con- 
sistant with the past. Each pattern found in the base is considered as normal, all 
the rest is not. The number of observed anomalies and the elapsed time between 
these anomalies are decisive for the computation of an anomaly value. If the 
computed value exceedes a certain threshold during a certain time, an alert is 
raised. 

We are also interested in CORBA client behavior modeling. We propose a 
tree representation for the client behavior. Information about for client request 
messages are collected at server side but the client tree can be hosted elsewhere. 
The tree root marks a connection, a leaf marks a disconnection. Each node 
corresponds to an invoked request. Each branch represents a normal behavior 
observed between a connection and a disconnection. Contrary to [3], we consider 
in addition to request signatures, their parameter values. Each node contains 
information about request parameters (see section 3). We construct tolerance 

^ If the server is made of one object (which is the case in our current application), the 
connection and disconnection correspond respectively to bind and unbind between 
client and servant objects. If the server is made of several objects (which is the case 
in real applications), the defintion of connection and disconnection becomes more 
complicated. We didn’t work on this point for the moment. 
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intervals around these values which will be used by our detection algorithm (see 
section 4). 

In short, we propose to generate, during the training period, a tree repre- 
sentation of a client behavior. At the end of this period, we construct tolerance 
intervals. During the detection period we navigate through the tree starting from 
the root and compute a similarity degree. The observed behavior is considered 
as normal if we find a leaf with an acceptable value of the similarity degree. 
Otherwise, an alert is raised. 

We have developed a testbed which is the platform for testing our ideas. 
It consists of a simple banking application. This platform doesn’t constitute 
a real application with real CORE A users in interaction, but it allowed us to 
test the whole proposed approach (interception, training, detection). The results 
presented in this article are obtained from this platform. 

The first part of this article describes how information about COREA client- 
server intercations is acquired. The second part details the training period. The 
third part deals with the detection period. The last part shows the preliminary 
experimental results. We finally present our conclusions and future works. 

2 Acquisition of Information on Client-Server 
Interactions 

This part gives a brief resume of COREA security services, describes the basic 
surveillance features and presents a communication architecture based on these 
features. 

The OMG defines COREA security. First, it offers some security services 
(cf chapter 15 of [4]). Then, it provides the interceptors which constitute the 
basic mechanism to implement security or any other service placed between the 
client-server communication (see chapter 21 of [5]). 

Three security service levels are defined: 

— first level services: authentification, access control, confidentiality and audit 
are provided to COREA applications without any possibility to read related 
parameters or to have special security knowledge; 

— second level services: add a protection against replay and offer access to a 
security administration interface; 

— non repudiation which is an optionnal service. 

If security is specified, we are still far from a portable implementation in the 
commercial OREs, especially for the audit service. That’s why we decided to use 
the interceptors to acquire information on client-server interactions. 

Interceptors can plug specific processing in request paths to COREA objects, 
in both client or server side. The OMG presents this extension as the basis 
for services to allow observation and transformation of requests and responses 
without any impact on the ORE kernal, it just calls a standard interface. 

There are two levels of interceptors: request and message interceptors. Re- 
quest interceptors receive the request parameters, can modify them, invoke other 
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objects and then redirect the request. We make use of these interceptors because 
message interceptors access fragments carrying requests and responses, which 
is not interesting for us. Moreover, Visibroker (the ORB used for our testing 
model) only offers request level interceptors^. The implementation of a client 
and/or server side interceptor is done at the client-server connection (bind). 

The proposed architecture for our platform is shown in Figure 1. The choice 
of this architecture is based on these important factors: 

— The interception process has no impact on the client that can evolve in an 
unsecure context; 

— servant objects can be fit up with interceptors without any code modification; 

— there is a clear distinction between a capture agent, which holds the inter- 
ceptors, and an administrator; they communicate through the ORB and can 
be hosted in different machines. 



server site 




Fig. 1. Architecture and principles of object exchanges 



Giving this architecture, the client surveillance follows these steps: 

1. start of the administrator which observes the agent registrations; 

2. start of the server which initializes the ORB and loads, without its knowl- 
edge, the agent thanks to a command line option; 

® Visibroker for Java version 3.3 proposes three classes of request level interceptors [2]: 
Bindinterceptor, Clientinterceptor et Serverinterceptor. 



134 



Zakia Marrakchi et al. 



3. the loaded agent initializes the interception mechanism and registrates itself 
to the administrator; 

4. server requests are not captured until the administrator asks explicitly for 
trail; 

5. When the administrator asks for an interception, information is then com- 
municated from the ORB to the agent which carries it to the administrator. 

The client surveillance is made during the training period and the resulting 
audit file constitutes the starting point for both the client behavior modeling 
and the detection period presented in the next two parts. 



3 The Training Period 

During each client connection, request messages to CORBA objects are observed 
in order to construct their behavior model. During this period which is transpar- 
ent to clients, the learned behaviors are considered as attack-free. In the behavior 
modeling, we consider a series of requests and their parameter values between 
each connection and disconnection (see section 3.1). Further, we construct tol- 
erance intervals around observed values (see section 3.2). These intervals offer 
flexibility in the measure of deviation from learned behavior, called reference 
behavior. This measure is performed in the detection period (see section 4). 

3.1 Structure of the Behavior Base 

Our behavior model is based on a variable-length branches tree representation. 
The tree root marks a connection, a leaf marks a disconnection. A tree node 
corresponds to a request and its parameters. Then, a tree branch is a normal 
behavior observed between a connection and a disconnection (see Figure 2). 
During the training period, we represent each client behavior by such a tree 
representation. The set of all trees constitutes the behavior base. 

The tree construction is done by insertion of branches. We approximate the 
final state of the tree by adding branches until N successive behaviors are al- 
ready known. However this approximation can only be achieved practically. The 
training period and the value of N depend on the complexity of the application: 
the richer the application, the longer the training period. For a real application, 
several weeks and even several months are probably required. 

In our behavior model, only the order of invoked requests is considered in 
the definition of normal behavior. No time consideration is taken into account, 
for the moment. In addition to invoked requests, a behavior is defined by their 
parameter values. An important gap between learned and observed values of 
the parameter can reveal an anomaly. Nevertheless, a reasonnable repartition 
around learned values should be accepted. To allow small variations around nor- 
mal values, we construct tolerance intervals. These intervals consider aroimding 
values as acceptable even if they have never been observed in the past. 
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C(x) 

F(x,y,z) 




Fig. 2. Construction of the behavior base 



3.2 Construction of Tolerance Intervals 

The difficulty here is to determine a threshold from which the observed value 
must be considered too different from those learned. In other words, the problem 
is to define a measure of the gap between what has been learned and what is ob- 
served. An analysis of what is learned is essential to construct the corresponding 
tolerance intervals. Intervals are defined only for numerical data. For symbolic 
data, such as the connected client name, only observed values can be accepted. 

The first step of data analysis is to represent in clusters (as shown in the 
first part of Figure 4 for the numerical parameter x) the number of occurences 
of all observed values for each numerical parameter. The second step consists of 
isolating clusters of learned values. Eventually, these clusters may be reduced to 
isolated values. 

Further, we associate to each cluster a tolerance interval which width 5 de- 
pends on the security level associated to the request. For example, for a param- 
eter such as the amount of money taken from a bank account, small variation 
around learned values is allowed, <5 is then small. The analysis may be improved 
by adapting the degree of deviation 5 for each numerical parameter within the 
same request. In fact, 6 can also depend on the width and the density of the 
associated cluster. We think that for large clusters (values frequently observed) 
we should allow more deviation than for isolated values. Of course, these as- 
sumptions should be tested for real data. Currently, this step is made by hand 
but we are working on automatizing this process using a distance criteria which 
is context dependant. 

The tolerance interval is represented by a trapezoidal function expressed by 
the following formula (see Figure 3): 
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va{x) 



0 

1 

X — X\ 

X2 — XI 
X4 — X 
X4 — X3 



for X < Xi or X > X4 
for X e [x2, X3] 
for X € [xi,X2[ 
for X €]cc3, X4] 



( 1 ) 




Fig. 3. Trapezoidal representation of an interval. Points X2 and 0:3 delimit 
the cluster of observed values. Points x\ and x^ express the authorized gap 
6 around X2 and X3 



The constructed interval allows, as explained previously, the client to deviate 
from his learned behavior with a certain authorized degree. Moreover, it limits 
the false alert rate during detection by providing a computed degree of similarity. 
This notion will be defined in the next part. 

4 The Detection Period 

During the detection period, the CORBA application client is observed between 
each connection and disconnection. The objective is no longer to learn his be- 
havior but to decide of its legitimity, by computing a degree of similarity di. 
This degree measures the deviation between learned and observed behaviors. 

During each client connection, we navigate through the tree and we adjust 
the similarity degree at each node until a leaf is reached. The computed degree 
allows us to decide whether the observed client behavior for the connection i 
is normal or not. If the degree is less than a first acceptance threshold, called 
instantaneous threshold (s), the client behavior is deemed anomalous and an 
instantaneous alert is raised (see Figure 6). 

It is also interesting to observe the client behavior during a longer period 
covering several connections. In fact, small variations around learned values may 
reveal, during a certain period, an anomalous behavior. We propose to observe. 
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Fig. 4. Flexible representation of the cluster 

We consider, for example, the values of the parameter x in the re- 
quest F{x). The observed values for x during the training period are: 
10,12,14,14,17,18,20,20,150,155,160,172,200. We notice that these values are 
distributed in two intervals [10,20] and [150,200]. During the detection period, 
an observed value 148 should not be considered as anomalous because it is close 
to the learned values ([150,200]) 



during successive connections, the obtained values of the similarity degree di. We 
use a second threshold, called composed threshold (s'), to decide on the activation 
of a second alert, called composed alert. 

If the degree di keeps less than a second threshold (s') for a certain time 5t 
greater than At, the client behavior is deemed anomalous and a composed alert 
is raised (see Figure 6). This alert shows possible correlations between several 
connections that may constitute an attack. 
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Section 4.1 details the computation of the similarity degree. Section 4.2 shows 
the two types of generated alerts. 

4.1 Computation of the Similarity Degree 

The observed behavior is considered as anomalous if its similarity degree is less 
than a certain instantaneous threshold s (a low threshold implies a low corre- 
spondance between observed and learned behavior). The lower the threshold, 
the higher the number of accepted behaviors. Increasing the threshold limits 
the number of accepted behaviors, but increases the rate of false alerts. The 
problem is to find a compromise between detection efficency and an acceptable 
rate of false alerts. This compromise is currently obtained experimentally (see 
section 5). 

The computation of the similarity degree di for the connection i begins at 
the tree root with = 1. Then, this degree is adjusted at each visited node 
by applying an eventual penalty. This penalty depends on the acceptance values 
vtti obtained for each numerical parameter of the request (cf. definition in sec- 
tion 3.2). The value of di at the current node k (df) is expressed in terms of its 
value at the previous node (d^~^): 

dfc = 4-i-P(va,..m„)(fc^0) (2) 

P{vai..van) expresses the global penalty applied to the current request. This 
penalty is obtained by the conjunction of elementary penalties pi{vai) computed 
for each numerical parameter of the request. This allows all elementary values to 
contribute to the computation of the penalty P applied to the request. We must 
choose a conjunction operator from three possible: min, max and average. The 
min operator computes the minimum of the pi{vai). The major inconvenient of 
the min is its high sensibility to null values. It implies a null penalty P for the 
request and then, may accept wrongly some requests. In opposition to the min, 
the max operator applies to the request the highest elementary penalty observed 
in the parameters, by considering the maximum of all the pi{vai), which is 
restrictive. We then decided on the average operator. Contrarely to the previous 
ones, this operator has a compensatory effect that implies a low sensibility to null 
values. The global penalty is expressed in terms of the elementary penalties pi 
as follows: 



P{vai..van) 



n 



( 3 ) 



The elementary penalty pi of the parameter i is obtained in terms of the ac- 
ceptance value vai of this parameter. This penalty decreases when vat increases, 
starting from the point (vai,pi) = (0,1) up to the point (1,0) as shown in Fig- 
ure 5. If the observed value of a parameter is not found in the previously defined 
interval, the acceptance value vai of this parameter is low. Indeed, the associ- 
ated elementary penalty pi is high and the degree di decreases. The higher the 
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acceptance value, the lower the penalty and the less the similarity degree will 
decrease. The elementary penalty is expressed by the following formula: 

p{va) = (1 — va)° (4) 

This function allows the control of the curve slope by varying the degree o. To 
illustrate this, let’s consider the same example as in the training period. Suppose 
that the first client invoked request is F{x) with x = 140. Its acceptance value is 
then equal to 0.4 (see Figure 4). The table 1 shows the variation of the similarity 
degree in terms of the penalty. Our preliminary results are based on a quadratic 
penalty function (o = 2) applied to all invoked requests. We propose, in our 
future works (see section 6), to vary the degree o for each parameter in terms of 
its sensibility towards the application. A high degre o expresses the acceptance 
of a larger variation of the associated parameter. 




va values 



Fig. 5. Penalty function for the acceptance values 



4.2 The Detection Results 

The detection algorithm provides two types of alerts. First, at each client con- 
nection, we perform a test which can activate an instantaneous alert. This test 
consists in verifying whether we reached a tree leaf or not. In the event that 
we do, if the computed value of di during the connection i is higher than the 
threshold s, no alert is raised. Otherwise, an instantaneous alert is raised. When 
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Table 1. Variation of di in terms of pi{vai) 



0 


pi{vai) = (1 — 0.4)° 


— P{va) 
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0.36 


0.64 


4 


0.1296 


0.8704 


8 


0.0167 


0.9832 



the client disconnects without reaching a leaf, we consider that this behavior is 
intrusive and we activate an instantaneous alert. During the navigation through 
the tree branches, the degree di may be annuled in two cases: either the client 
has cumulated several suspect requests which implies a null value of di, or he 
invoked an unexpected request in the path already selected in the tree. In the 
latter case, we immediatly annul the similarity degree because we consider each 
unexpected request as intrusive, which can be restrictive. We propose in the fu- 
ture to take into account request insertions and deletions in the detection period 
in order to authorize certain deviation. In this case, the similarity degree will be 
decreased but no longer annuled. 

We continue the observation of a client behavior during the following con- 
nections in order to detect a series of anomalous connections. A composed alert 
is raised if the similarity degree di keeps less than a composed threshold s' during 
a certain time St longer than At. Figure 6 shows the evolution of di during the 
time and states the two types of alerts. We note that during Sti, di keeps less 
than s and s', which activates for each disconnection an instantaneous alert but 
no composed alert as Sti is less than At. During St2, di keeps less than s' which 
activates a composed alert as St2 is longer than At. During the same interval of 
time St2, there’s no instantaneous alert as di keeps higher than s. 

For want of a real application, no reaction strategy is expected regarding this 
situation for the moment. Various reaction possibilities against intrusive detected 
behavior will have to be studied in the future for a real CORBA application. 

5 Preliminary Experimental Results 

We tested our approach on the developed testbed which exhibits encouraging 
results. 

The learned behavior is made of a series of basic operations on a banking 
account (account creation, payment, withdrawl, balance consultation, etc.). The 
IDL specification of these operations is the following : 

module Bank{ 

interface Account! 

float getBalance 0 ; 

float deposit (in float sum) ; 

float withdraw (in float sum) ; 



} 
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Instantaneous alert 

Combined alert 



Fig. 6. Generated alerts 



interface AccountManager{ 

void open (in string clientName) ; 
void close (in string clientName) ; 
Account search (in string clientName) ; 

} 

> 



The collected data is contained in an audit trail file. After filtering, the 
audit trail file contains information about the object invoked, the request and 
corresponding parameters, as shown in the following example : 

connection 

Bank: : AccountManager : :open(string bv) 

Bank: : AccountManager : : search(string bv) 

Bank: : Account: : deposit (float 200.0) 

Bank: : Account: :getBalance() 

Bank: :Account: : withdraw (float 100.0) 

Bank: :Account: : withdraw (float 50.0) 

Bank: : Account: :getBalance() 

Bank: :Account: : withdraw (float 20.0) 
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Bank: : AccountManager : : search(string zm) 
Bank: : Account: : getBalance () 
disonnection 
connection 

Bank: : Account: : deposit (float 100.0) 
Bank: :Account: : withdraw (float 100.0) 
Bank: : Account: : getBalance () 

Bank: : AccountManager : :open(string Im) 
Bank: : AccountManager : : search(string Im) 
Bank: : Account: : getBalance () 
disconnection 





Fig. 7. Variation of the similarity degree 
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First tests consist in applying small variation of the client behavior to test 
the pertinence of the tolerance intervals previously defined. Results show that 
the number of raised alarms depends on the authorized gap 5 (see Figure 7). We 
are working on the automatic adjustment of this gap regarding the associated 
parameter sensibility. Further studies will focus on the gap <5 variation for each 
clusters of values of a given parameter. 

Once the gap S is fixed, we made a second series of tests aiming at adap- 
tating the threshold s considering what has been learned. The number of raised 
alarms also depends on this threshold which depends on the context. In fact, 
this threshold must vary considering both the complexity of the application and 
the variation of the learned client behavior. To carry on these studies, a real 
CORBA application should be provided. We are waiting for such application. 

Currently, we use previously fixed values of the thresholds s and s' and the 
gap (5 to decide on the activation of alarms. However, these parameters are 
context dependant. We hope to dispose soon of a real application to test our 
approach and to conduct further studies on theses parameters. 

6 Conclusion 

We presented in this article our approach of the intrusion detection problem 
and a tree representation for the modeling of client behavior. This approach 
was tested in a CORBA environment. The detection algorithm is based on the 
measure of deviation between observed and learned behaviors. Giving the au- 
thorized degree of deviation for each parameter of the invoked requests, the 
algorithm computes a similarity degree. This degree is computed during each 
connection and allows us to decide on the activation of an alarm considering 
a fixed threshold. The surveillance of the client during many connections can 
reveal an attack composed of successive suspicious deviations. 

The results show a good efficiency of the detection algorithm. A the end of 
a connection, it provides the corresponding similarity degree showing (or not) 
an anomalous behavior. Depending on this degree, we should decide on the 
consistency of the alarm but we are not able to make such a decision as we don’t 
have a real CORBA application. The false alarm rate was not studied for the 
moment. 

We propose to conduct complementary researches to our current work. Fur- 
ther studies concern especially the following points: 

— The variation of the penalty function degree o: we used for our preliminary 
tests a quadratic penalty function but we propose to vary this degree and 
test other penalty functions. 

— The dependance between successive request parameters: in fact, we think 
that the invocation of a request with certain values has an influence on the 
values of the following request. We are interested in studying these correla- 
tions by expressing them using constraints defined at each node. 

— We have also raised in this article the problem of insertion and deletion of 
requests. We suggest to accept these situations to allow flexibility in the 
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client behavior. However, we will point out these situations by decreasing 
the similarity degree. 

— We discussed the possibility of reacting after detecting. An intrusion detec- 
tion system should be able to activate alarms, be sure of their consistency 
and then react against these intrusions. Currently, we just work on detec- 
tion mechanisms. We are also thinking about a reactive mechanism for a real 
CORBA application. One of the reviewers suggest us to propose a virtual 
environment for a suspected client. One of the possible reactions against an 
intruder is then to protect the real system from an eventual damage. 

— The variation in the time elapsed between successive invocations may reveal 
an anomalous behavior. Thus, we plan, as suggested by a reviewer, to con- 
sider, in addition to the order of invoked requests in a sequence, the time 
interval between successive requests. 
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Abstract. In 1998 (and again in 1999), the Lincoln Laboratory of MIT 
conducted a comparative evaluation of Intrusion Detection Systems de- 
veloped under DARPA funding. While this evaluation represents a sig- 
nificant and monumental undertaking, there are a number of unresolved 
issues associated with its design and execution. Some of methodologies 
used in the evaluation are questionable and may have biased its results. 
One of the problems with the evaluation is that the evaluators have pub- 
lished relatively little concerning some of the more critical aspects of their 
work, such as validation of their test data. The purpose of this paper is 
to attempt to identify the shortcomings of the Lincoln Lab effort in the 
hope that future efforts of this kind will be placed on a sounder footing. 
Some of the problems that the paper points out might well be resolved if 
the evaluators publish a detailed description of their procedures and the 
rationale that led to their adoption, but other problems clearly remain. 

Keywords: Evaluation, IDS, ROC Analysis 



1 Introduction 

The most comprehensive evaluation of research Intrusion Detection Systems that 
has been performed to date is an ongoing effort by MIT’s Lincoln Laboratory, 
performed under DARPA sponsorship. While this work is flawed in many re- 
spects, it is the only large scale attempt at an objective evaluation of these 
systems of which the author is aware. As such, it does provide a basis for mak- 
ing a rough comparison of existing systems under a common set of circumstances 
and assumptions. 

It is important to note that the present paper is a critique of existing work, 
not a direct technical contribution or a proposal for new efforts, per se. Its 
purpose is to examine the work done by the Lincoln Laboratory group in a 
critical but scholarly fashion, relying on the public (published) record to the 
greatest extent possible. The role of the critic is to ask questions and to point out 
failings and omissions, but not necessarily to provide solutions to all the issues 
raised. Indeed, the problem is large and complex and one of the likely reasons 

* This work was sponsored by the U.S. Department of Defense. 
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for many of Lincoln’s failures that its size and complexity clearly outstripped 
the resources available to apply to it. Many of the issues raised in this paper will 
require substantial resources and effort to resolve, and, at the time of writing, 
these resources were not available. Still, it is to be hoped that the community as 
a whole will be able to address these problems in connection with future efforts. 

The analysis given here is presented with the goal of promoting a discussion 
of the difficulties that are inherent in performing objective evaluations of soft- 
ware. Although the software in question performs security related functions, the 
questions raised in its evaluation should be of interest to the broader software 
engineering community, particularly to that portion of the community that deals 
with software testing or evaluation and reliability estimation. As far as we have 
been able to determine, no comparable efforts have been reported elsewhere in 
the software evaluation and testing community. Only the usage modeling and 
statistical testing used by the Cleanroom methodology [13, Ch. 10] seems to 
come close. 

We concentrate on the 1998 evaluation. The 1999 evaluation was under way 
when the original version of this paper was written and its results, though pre- 
sented in a number of meetings (including RAID 2000) have not been published 
in detail. Many of the changes made during the 1999 evaluation do not affect 
the observations or conclusions of this paper. The data used in 1999 was similar 
in form to that of 1998. Sessions were not identified in the training data, making 
the unit of analysis problem described in section 5.1 more difficult. TCP dump 
data was sensed both inside and outside the target system and a Windows NT 
victim was added. A relatively permissive security policy was described for the 
targets. A wider variety of attacks were represented in the data, including several 
insider attacks. Initial presentations of the 1999 results relied heavily on ROC 
analysis (see section 5.2), but more recent presentations have dropped this ap- 
proach entirely. Missed attacks were analyzed in some detail with investigators 
being asked to explain why their system did not detect them. In many cases, 
especially for rule based systems, the misses were due to decisions by the inves- 
tigators (encouraged by the sponsor) to concentrate on detection technique at 
the expense of complete rule bases. It would be interesting to rescore the 1999 
results, using the optimistic assumption that such misses are correctable in a 
production version of the system. 

The discussion begins with a consideration of the methods used to generate 
the data used for the evaluation. There are a number of questions that can be 
raised with respect to the use of synthetic data to estimate real world system 
performance. We concentrate on two of these; the extent to which the experi- 
mental data is appropriate for the task at hand and the possible effects of the 
architecture of the simulated test environment. This is followed by a discussion 
of the taxonomy developed to categorize the exploits involved in the evaluation. 
The taxonomy used was developed solely from the attacker’s point of view and 
may introduce a bias in evaluating manifestations seen by the attacked. 

The Lincoln Lab evaluation uses the ROC, variously known as the receiver 
operating curve or relative operating characteristic as the primary method for 
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presenting the results of the evaluation. This form of analysis has been used in a 
variety of other fields, but it appears to have some unanticipated problems in its 
application to the IDS evaluation. These involve problems in determining appro- 
priate units of analysis, bias towards possibly unrealistic detection approaches, 
and questionable presentations of false alarm data. 

2 Evaluation Overview 

The descriptions of the evaluation that have appeared in print leave much unsaid 
and it may be that a more detailed exposition of the work will alleviate some of 
the criticisms contained in this paper. The most detailed descriptions of the work 
available at the present time are Kristopher Kendall’s BS/MS Thesis [6] and a 
paper [9] presented at DISCEX in January, 2000. In addition, the Lincoln Lab 
team has made presentations on the experiment at various meetings attended 
by the author. These include the August 1999 DARPA PI meeting in Phoenix, 
AZ and RAID 99. Presentations [8, 4] similar to ones given at those meetings 
also appear at the Lincoln Lab experiment site, http://ideval.ll.mit.edu^. 

According to the DISCEX paper [9], “The primary purpose of the evalua- 
tions is to drive iterative performance improvements in participating systems by 
revealing strengths and weaknesses and helping researchers focus on eliminat- 
ing weaknesses.” The experiment claims to provide “unbiased measurement of 
current performance levels.” Another objective is to provide a common shared 
corpora of experimental data that is available to a wide range of researchers. 

While these goals are laudable, it is not clear that the way in which the 
evaluation has been carried out is consistent with the goals. In section 3 we 
will discuss the adequacy of the data set used during the evaluation, suggesting 
that, at best, its suitability for this purpose has not been demonstrated. The 
way in which the results of the evaluation have been presented (through the use 
of ROC and ROC like curves as discussed in section 5.2) seems to demonstrate 
a bias towards systems that can be tuned to a known mix of signal and noise, 
even though the appropriate tuning parameters may not be possible to discover 
in the wild. Each of these factors will be discussed further in the appropriate 
sections. 

Many of the systems evaluated by the Lincoln Lab group have been described 
in a variety of technical publications, some of which are cited in the DISCEX 
paper [9] . Each system under test was evaluated by its developers who adapted 
the data as necessary to fit the system in question [9, Section 7]. It is highly likely 
the disparate behaviors of the individual investigators introduced unintentional 
biases into the results of the evaluation, but there has been no discussion of this 
possibility in any of the presentations or in the DISCEX paper. 

^ This site is password protected. For information concerning access, contact 
intrusionOsst . 11 .mit . edu. 
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3 The Evaluation Data 

For reasons having to do with privacy and the sensitivity of actual data content, 
the experimenters chose to synthesize both the background data and the attack 
data used during the evaluation. There are problems with both components. 
The data also reflects problems that are inherent in the architecture used to 
generate it. The generated data is intended to serve as corpora for present and 
future experimenters in the field. As such, it may have a lasting impact on the 
way IDS systems are constructed. Unless the performance of an IDS system on 
the corpora can be related accurately to its performance in the wild, there is a 
risk that systems may be biased towards unrealistic expectations with respect 
to true detections, false alarms, or both. It is also necessary to ensure that the 
corpora are sufficiently large so that deviations from the desired norm do not 
alter evaluation results. 

The data generated for the evaluation consists of two components, back- 
ground data that is intended to be completely free of attacks and attack data 
that is intended to consist entirely of attack scenarios. The test stream results 
from simultaneously generating and interleaving of the two components. If we 
view background data as noise and attack data as signal, the IDS problem can 
be characterized as one of detecting a signal in the presence of noise. The eval- 
uation produces two measures, one primarily a function of the noise, the other 
primarily a function of the signal embedded in noise. Given this approach, it is 
necessary to ensure that both the signal and the noise used for the evaluation 
affect the systems under test in a manner related to signals and noise that occur 
in real deployment environments. 

3.1 Background Data 

The process used to generate background data or noise is only superficially 
described in the thesis and presentations. The data is claimed to be similar to 
that observed during several months of sampling data from a number of Air Force 
bases, but the statistics used to describe the real traffic and the measures used 
to establish similarity are not given, except for the claim that word and word 
pair statistics of email messages match those observed. The DISCEX paper [9, 
Sections 3 and 4] devotes approximately a page to a discussion of this issue 
and makes a broad claim that the data is similar to that seen on operational 
Air Force bases. It has been observed that internet site behaviors differ greatly^ 
and, while it is possible that Air Force bases form an exception, the notion of 
a typical mix of background traffic should be viewed with some skepticism. If 
this skepticism is justified, and if the nature of the background traffic is shown 
to have a substantial impact on IDS performance, we see no alternative to the 
provision of much more extensive corpora of evaluation data. 

^ This was pointed out by one of the anonymous reviewers for RAID 2000. Although 
it may not hold for Air Force bases, it is a factor to consider in extending the results 
of the evaluation to more general environments. 
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As far as can be determined from the record, neither analytical nor experi- 
mental validation of the background data’s adequacy was undertaken prior to the 
evaluation. No rationale is given that would allow a reader to conclude that the 
systems under test should exhibit false alarm behaviors when exposed to the ar- 
tificial background data that are similar to those that they exhibit when exposed 
to “natural” data. This is particularly troublesome since the metric used for the 
evaluation of the IDS systems under test is an operating point characterized by 
the percentage of detected intrusions at a given false alarm rate or percentage. 
False alarms should arise exclusively from the background data, and it would 
appear incumbent upon the evaluators to show that the false alarm behavior of 
the systems under test is not significantly different on real and synthetic data. 

Real data on the internet is not well behaved. Bellovin reported on anoma- 
lous packets [2] some years ago. Observations by Paxson [12] indicate that the 
situation has become worse in recent years with significant quantities of random 
garbage being frequently observed on the internet . This internet “crud” consists 
of legitimate but odd looking traffic. Poor implementations of protocols often 
result in spontaneous packet storms that are indistinguishable from malicious 
attempts at flooding. Many of the packets that Bellovin and Paxson observe 
could (and probably should) be interpreted as suspicious. As far as we can tell, 
such packets were not included in the background traffic. 

None of the sources that we have examined contain any discussion of the data 
rate and its variation with time is not specified. This may be another critical 
factor in performing an evaluation of an IDS system because it appears that 
some systems may have performance problems or may be subject to what are, 
in effect, denial of service attacks when deployed in environments with excessive 
data rates^. We have performed a superficial examination of several days of the 
tcpdump training data. The results indicate averages in the 10 to 50 kilobit per 
second range over the 22 hour period. Given that most of the activity occurs 
during working hours, the daylight rate may be 2 or 3 times this. In contrast, 
data rates at the Portland State University Computer Science department («100 
workstations) and Engineering school (several hundred) are 1 and 10 megabits 
per second respectively. Paxson indicates sustained data rates in excess of 30 
megabits per second [12] on the FDDI link monitored by the Bro IDS. Since one 
would expect false alarm rates to be proportional the background traffic rate 
for a given mix, the false alarm rates reported by Lincoln Lab may need to be 
adjusted. 

3.2 Attack Data 

Similar arguments can be made about the synthetic attack data. The attacks 
used were implemented via scripts and programs collected from a variety of 
sources. As far as can be determined from the available descriptions, no attempt 
was made to ensure that the synthetic attacks were realistically distributed in 

® This factor may not be relevant for an offline evaluation, bnt we would expect the 
evaluators to consider timing. 
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the background noise. This may or may not be significant, depending on several 
factors, including the use to be made of the evaluation results, but it raises 
several issues. Reporting an aggregate result over any mix requires a strong 
caveat to the effect that the results may not apply to other mixes. This is more 
important if the mix is atypical^. In addition, some systems that require training 
on a known mix of attack and background data may be sensitive to the mix and 
fail to perform as well on substantially different mixes. 

Kendall [6, Section 12.2] describes the total number of attacks in various 
categories that were included in the training and test data sets. Some 300 attacks 
were injected into 10 weeks of data, an average of 3 to 4 attacks per day. Kendall 
gives a tabulation of the attack data in [6, Table 12.1]. In each of the major 
categories of the attack taxonomy (User to Root, Remote to Local User, Denial 
of Service, and Probe/Surveillance) the number of attacks is of the same order 
(114, 34, 99, and 64). This is surely unrealistic as current experience indicates 
that Probe/ Surveillance actions are by far the most common attack actions 
reported. 

An aggregate detection rate based on the experimental mix represented in the 
corpora is highly unlikely to reflect performance in the field. If a more detailed 
analysis and presentation of the data were to be used, the attack mix would be 
less significant, although the user of the evaluation results would have to invest 
more time and effort in understanding the results and their significance. Par- 
ticular care would be needed to ensure that the presentation does not obscure 
the characteristics of the evaluated systems in this case. For example, the attack 
taxonomy used combines attacks that have widely differing manifestations. Re- 
porting, for example, that a given system detected 60% of the denial of service 
attacks may reflect the system’s ability to detect some kinds of manifestations 
and not others rather than reflecting on its ability to detect that taxonomic cat- 
egory of attack. Thus, even reporting performance by taxonomic attack category 
may be misleading if the distribution of attack manifestations is not explicitly 
considered. 

The evaluation data represents an attempt at creating a somewhat realistic 
test environment in which known attacks are executed in a background of normal 
activity. A number of researchers have said that it would also be useful to present 
a variety of attacks under ideal conditions without background traffic so as to 
separate detection characteristics from the confounding effects of the background 
traffic. Providing attack data in this form was not one of the goals of the Lincoln 
effort. 

While it is clear that there is no such thing as a typical mix of attacks, experi- 
ence over the past few years indicates that hacker activity on the internet (the 
attack population represented in the experimental mix) consists primarily of probe 
activities followed by fairly large numbers of the most recently popular attack du 
jour, followed by a sprinkling of less recently publicized attacks against well known 
vulnerabilities. 
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3.3 Eyrie AFB 

The simulated data is said to represent the traffic to and from a typical Air Force 
Base, referred to as Eyrie AFB. The thesis [6, Figure 3-1] and the information 
available from the Lincoln Lab web site seem to differ on the details of the 
configuration. The host list for weeks 3-7 lists additional hosts linuxl - linuxlO 
which are probably implemented on the additional Linux target mentioned in the 
thesis, but the week 3-7 network diagram does not show this host. The DISCEX 
paper [9, Section 3] is less specific. 

The thesis contains a list of the attacks [6, Appendix A] from the test phase of 
the evaluation. 45 attacks target Pascal, 28 target Marx, 12 target Zeno, 10 target 
one of the virtual Linux machines, and 5 (all the same scenario) target the router. 
The only attacks that attempt to access any of the other simulated machines at 
Eyrie are probes or scans for which no response is necessary. The skewed nature 
of the attack distribution may affect the evaluation. By the end of the training 
period, it should have been clear to the testers that only a small subset of the 
systems are actually subject to interactive attacks. Tuning or configuring the 
IDS under evaluation to look only at these systems would be an effective way to 
reduce false alarms and might raise the true alarm rate by reducing noise. This 
appears to fall within the letter, if not the spirit, of the 1998 rules though there 
is no evidence that it was done by any of the participants. 

Although it is claimed that the traffic used in the evaluation is similar to 
that of a typical Air Force base, no such claim is made for the internal net- 
work architecture used. The unrealistic nature of the architecture is implicitly 
acknowledged by Kendall [6, Section 6.8] where it is noted that the flat structure 
of the simulation network precluded direct execution of a “smurf” or ICMP echo 
attack. It is not known whether the flat network structure used in the experiment 
is typical of Air Force bases, but this seems doubtful as does the relatively small 
host population. Investigation of whether this as well as the limited number of 
hosts attacked affect the evaluation is needed. Certainly, intrusion detection sys- 
tems that make a stateful evaluation of the traffic stream are less likely to suffer 
from resource exhaustion in such a limited environment. 



3.4 Does It Matter? 

Perhaps and perhaps not. Many experiments and studies are conducted in en- 
vironments that are contrived. Usually, this is done to control for factors that 
might confound the results. When it is done, however, the burden is on the ex- 
perimenter to show that the artificial environment did not affect the outcome of 
the experiment . A fairly common method of demonstrating that the experimen- 
tal approach being used is sound is to conduct a controlled pilot study to collect 
evidence supporting the proposed approach. As far as we can tell, no pilot studies 
were performed either to validate the use of artificial data or to ensure that the 
data generation process resulted in reasonably error free data. The evaluators at 
Lincoln Lab have not shown that the test environment that they created does 
not confound the evaluation in ways that would affect its objectives. 
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3.5 Training and Test Data Presentation 

The evaluators prepared datasets for the purposes of “training” and “test.” The 
training set consists of seven weeks of data covering 22 hours per day, 5 days per 
week. As discussed in section 5.1 below, the training data contains attacks that 
are identified in the associated lists. It also contains examples of anomalies, here 
defined rather restrictively as departures from the normal behaviors of individual 
system users rather than the more common usage of abnormal or unusual events. 

The apparent purpose of this data was to provide the researchers being eval- 
uated with corpora containing known and identified attacks that could be used 
to tune their systems. For the systems based on the detection of anomalies, the 
training data was intended to provide a characterization of “normal,” although 
the presence of attacks in the data renders it questionable from this standpoint. 
The question of the adequacy of this data for its intended purpose does not 
seem to have been addressed. There is no discussion, for example, of whether 
the quantity of data presented is sufficient to train a statistical anomaly system 
or other learning based system. Similarly, there is no discussion of whether the 
rates of intrusions or their relationship to one another is typical of the scenarios 
that such detectors might expect. 

For systems using a priori, non parametric, rules for detecting intrusion man- 
ifestations, the training data provides a sanity check, but little more. If there are 
background manifestations that trigger the same rule as an identified intrusion 
in the training data, and the developer wishes to use the training data to guide 
development of his system he might attempt to refine the rules to be more dis- 
criminatory. The user could also change the way in which the system operates 
to make detections probabilistic, based on the relative frequencies of identified 
intrusion manifestations and background manifestations that trigger the same 
rule. As we will see later, the ROC analysis method is biased towards detection 
systems that use this kind of approach. 

For systems that can be tuned to the mix of background and intrusions 
present in the training data, this bias may be inherent depending on whether 
the detection methods result in probabilistic recognitions of intrusions or whether 
internal thresholds are adjusted to achieve a similar effect. The problem with 
tuning the system to the data mix present in the training data is that transfer- 
ring the system experience to the real world either requires demonstrating that 
the training mix is an accurate representation of real world data with respect 
to the techniques used by each system or it requires that accurate real world 
training data be available for each deployment environment. We claim that the 
former conditions have not been met and that the latter may not be possible. 
As far as we are aware, existing studies of network traffic patterns show a high 
degree of variability among sites as well as substantial changes with time at a 
given site. As we have noted earlier, unless the target environments, i.e. military 
installations, are atypical, it may be the case that there is no such thing as a 
“typical” traffic mix that is suitable for background data. If each deployment 
environment is characterized by a unique traffic mix and if the ability of an IDS 
to detect intrusions effectively depends on tuning it to match the mix under 
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controlled conditions, the problem may well be intractable. More work on traf- 
fic characterization and the effects of traffic variability on the IDSs is clearly 
needed. 

If one views the corpora of training data as a form of benchmark against 
which present and future IDS systems might be evaluated, there is also a risk 
that systems might be optimized for the benchmark at the expense of normal 
case behavior. This is a well known problem in the software evaluation field. 

4 The Taxonomy of Attacks 

Kendall’s thesis uses a taxonomy of attacks that was originally developed by 
Weber [15]. The taxonomy describes intrusions from an intruder centric view- 
point based loosely on a user objective. For the purposes of the evaluation, the 
attacks used were characterized as 

1. Denial of Service, 

2. Remote to user, 

3. User to Superuser, or 

4. Surveillance/Probing 

and were further characterized by the mechanism used. The mechanisms were 
characterized as 

m Masquerading (stolen password or forged IP address) 
a Abuse of a feature 
b Implementation bug 
c System misconfiguration 
s Social engineering 

While this taxonomy describes the kinds of attacks that can be made on 
systems or networks, it is not useful in describing what an intrusion detection 
system might see. For example, in the denial of service category, we see attacks 
against the protocol stack, against protocol services, against the mail, web, and 
syslog services, and against the system process table. The effects range from 
machine and network slowdowns to machine crashes. From the standpoint of 
a network or host observer (i.e. most intrusion detection systems), the attack 
manifestations have almost nothing in common. From this, it can be seen that 
the taxonomy used in the evaluation offers very little support for developing an 
understanding of intrusions and their detection. We suggest that the taxonomy 
used is not particularly supportive of the stated objectives of the evaluation and 
that one or more of the potential taxonomies discussed in the following section 
could be more useful in guiding the process. 

The attacker centric taxonomy poses an additional problem. By tying attacks 
to overt actions on the part of a putative attacker, it creates a highly unrealistic 
evaluation bias. The treatment of probes is a case in point. Not all probes are 
hostile. They are a standard way of attempting to initiate internet communi- 
cation, but communication does not always occur even when the probed host 
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acknowledges that it provides the probed for service. As far as we have been able 
to tell, the 1998 background data does not contain this kind of benign probe ac- 
tivity, but the evaluation data contained at least one “attack” that consisted of a 
very small number of probes. We claim that, had the background data contained 
a typical mix of normal or benign probe data, these probes would have been dis- 
tinguishable as attacks only if the intent of the prober were known. While this 
is possible in the evaluation context, it is generally not possible in the field. 

4.1 An Alternative Taxonomy 

Attacks could be classified based on the protocol layer and the particular pro- 
tocol within the layer that they use as the vehicle for the attack. Under this 
approach, attacks such as “Land,” “Ping of Death,” and “Teardrop” are related 
because they never get out of the protocol stack. They are also similar in being 
detectable only by an external observer looking at the structure of the packets 
for the identifying characteristics. Smurf and UDPStorm attacks are even lower 
in the hierarchy because they affect the network and interface in the neigh- 
borhood of the victim. Also, they are detectable based on counting of packet 
occurrences which could be considered a lower level operation than examining 
packet structure. Attacks that involve altering the protocol stack state such as 
“SYNFlood” are higher since their detection either involves monitoring the state 
of the protocol stack internally, or modeling and tracking the state based on an 
external view. Attacks that require the protocol stack to deliver a message to an 
applications process (trusted or not) are still higher. Detecting such attacks re- 
quires either monitoring the messages within the host (between the stack and the 
application or within the application) or modeling the entire stack accurately, 
assembling messages externally and examining the interior data with respect to 
the view of the attacked application to determine the attack. Probes can take on 
a variety of forms, but are usually handled either within the stack (especially if 
the service sought is not supported) or via interaction with the application that 
supports the probed for service. 

A strength of this taxonomic approach is that it leads to an understanding 
of what one must do to detect attacks. Within a particular higher level proto- 
col or service this view may group attacks that exploit common vulnerabilities 
together, for example “Appache2” and “Back” exploit pathologies in the http 
specification while “phf” exploits a bug in the web server’s implementation of 
CGI bin program handling. 

Many other taxonomies are possible. The point is that the taxonomy must 
be constructed with two objectives in mind; describing the relevant universe 
and applying the description to gain insight into the problem at hand. Weber’s 
taxonomy serves the first purpose fairly well, but fails to provide insights useful 
to understanding the detection of intrusions. 
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5 The Evaluation 

The results of the evaluation and the way in which they have been presented 
by Lincoln Lab present a number of difficulties. We examine several of these, 
notably the problem of determining an appropriate “unit of analysis” and prob- 
lems associated with the use of the ROC method of analysis. The unit of analysis 
problem arises whenever experimental results are reported as percentages. The 
evaluated IDS systems report detections which can be characterized as either 
correct, i.e. an attack was reported when one was present, or incorrect, i.e. a 
false alarm, an attack that was reported when one was not present. The ROC 
method requires both correct and incorrect detections to be reported as per- 
centages of the possible cases in which the detection could have been made. In 
the case of the evaluation, successful detections can be reported as (number of 
attacks detected) / (number of attacks made), but no comparable denominator 
exists for reporting false alarms. The unit of analysis problem is well known in 
other fields [16] where it often results in ascribing more power than is appro- 
priate to the results of certain statistical tests. While this is not the case here, 
the problem exists and its solution is a necessary prerequisite to performing 
meaningful comparisons among systems. For example, two sustems may raise 
the same number of false alarms and have different false alarm percentages if 
one bases its decisions on the examination of entire protocol sessions while the 
other examines individual packets. 

ROC analysis is a powerful technique for evaluating detection systems, but 
there are a number of underlying assumptions that must be satisfied for the 
technique to be effective. It is not clear that these assumptions are or can be 
satisfied in the experimental context. In addition, ROC analysis is biased towards 
a classical detection approach not commonly used in IDS systems. 

5.1 TCPdump Data and the Unit of Analysis Problem 

The largest data set^ made available to investigators for evaluating their systems 
consists of raw TCPdump data collected with a sniffer positioned on the network 
segment external to the Eyrie AFB router. This dataset should contain all the 
data generated inside the simulated base destined for the outside world and all 
the data generated outside the base destined for an inside location. Experience 
with TCPdump indicates that it can become overloaded and drop packets al- 
though the possibility of this is reduced by the apparently low data rates used. 
The thesis indicates that attacks were “verified” by hand and that this process 
was very labor intensive [6, Section 13.2.2], but it is unclear what verification 
means here. 

Training data is accompanied by a list of the “sessions” that are present in 
the TCPdump data where a session is characterized by a starting time, duration, 
source, destination, and a protocol. If the session contained an attack, the list 

® Solaris BSM audit data and file system dump data were also available. We have not 
looked at them. 
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identifies the attack. Examination of a sample of the TCPdump data indicates 
that it contains additional traffic, e.g. messages from ethernet hubs, that is not 
in the list. 

The association of alarms with sessions is an instance of a more general unit 
of analysis problem. The question of an appropriate denominator for presenting 
the evaluation results is only superficially addressed. It may not be appropriate 
to use the same denominator for all systems and the choice of a denominator 
may vary from system to system or even from attack to attack within the same 
system. The appropriate unit of analysis is that body of information on which 
the system based its decision to raise or not raise an alarm. The denominator 
for the expression giving the percentage of true alarms is the number of cases 
when this decision point was reached and the body of data used to make the 
decision contained a manifestation of a real intrusion. Similarly, the appropriate 
denominator for false alarms is then the number of times that the system reached 
this decision point when the data on which the decision was based did not 
contain a manifestation of a real intrusion. These numbers are a function of the 
detection process and cannot be externally imposed unless the decision criteria 
are externally specified. Sessions may be the natural unit on which to base 
decisions in some systems and not for others and their use will bias the results 
when they are used as the unit of analysis where they are not appropriate. 

The use of sessions as the unit of analysis presents other potential problems. 
Attacks are, of necessity, associated with a single session under this model, pre- 
cluding the injection of coordinated attack behavior involving multiple sources 
and/or protocols. For example one could envision probes carried out from a 
large number of locations so that no single source address appears more than 
once. The session model seems to preclude this. Although the injected attacks 
are associated with sessions in the test data, nothing constrains the evaluated 
systems to use the session concept, and it is possible that alarms may be raised 
as a result of events contained in more than one session. 

5.2 Scoring and the ROC 

The Lincoln Lab team decided to use a technique known as the ROC® as the 
method for presenting their results and the use of this technique is claimed as 
one of the major contributions of their effort in the DISCEX paper [9, Section 
2] . The ROC has its origin in radar signal detection techniques developed during 
World War II and was adopted by the psychological and psychophysical research 
communities during the early post war era [14]. Its adoption by the Lincoln Lab 
group is not surprising given that their background is in speech recognition (word 
spotting in particular). Much of the discussion that follows is due to Egan [.3]. 
Signal detection theory was developed during the two decades following World 
War II to give an exact meaning, in a probabilistic sense, to the process of 

® The term ROC originally stood for Receiver Operating Curve. Since the technique 

has been widely used to evaluate systems that do not have a recognizable receiver, 

ROC is commonly interpreted as Relative Operating Characteristic. 



The 1998 Lincoln Laboratory IDS Evaluation 



157 



recognizing a wanted signal that has been degraded by noise. The methods took 
into account the relationship between the physical characteristics of the signal 
and the theoretically achievable performance of the observer. Later the concepts 
of signal detection theory were adapted to provide a basis for examining some 
problems in human perception. The basis for the ROC is given by Egan [3, P. 2] 

When the detection performance is imperfect, it is never assumed 
that the observer “detects the signal.” Rather, it is assumed that the 
observer receives an input, and this input corresponds to, or is the equiv- 
alent of, the unique value of a likelihood ratio. Then, given other factors, 
such as the prior probability of signal existence, the observer makes the 
decision “Yes, the odds favor the event signal plus noise'' or “No, the 
odds favor the event noise alone." 

Egan goes on to note that signal detection theory consists of two parts, 
decision theory, which deals with the rules to be used in making decisions that 
satisfy a given goal, and distribution theory, dealing with the way in which the 
signals and noise are distributed. When the distributions are known (or can be 
assumed) the relationship between the distributions and possible performances 
is called ROC analysis. 




Fig. 1. A single point ROC 



A typical ROC curve is a plot on two axes as seen in Figure 1. The vertical 
axis measures the true positive rate of the system (i.e. the Bayesian detection 
rate or the probability of a recognition given that signal plus noise is present). 
The horizontal axis gives the false positive rate (i.e. the probability that an alarm 
is raised given that only noise is present). An evaluation of a system provides 
estimates of these probabilities as the percentage of accurate and inaccurate 
recognitions in a series of trials under fixed conditions. By fixed conditions here, 
we mean constant distributions of signal plus noise and noise. 
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Note that there are two crucial aspects of the process. First, the observer 
receives an input and second, the observer makes a decision concerning that 
input. The observer thus controls the unit of analysis problem by defining the 
unit of analysis as the quantity of input on which a decision is made. Both 
positive and negative decisions must be recorded so that event counts for the 
denominators of the percentages used in ROC analysis will be available. Unless 
all the systems under evaluation are based on the same notion of an event on 
which a decision is to be made, choosing an arbitrary division in the input such 
as a packet or a session does not supply the necessary denominator. 

Both parametric and non-parametric IDS detectors exist. Non parametric 
detectors have no provision for adjusting the sensitivity of the detection mecha- 
nism to effect a tradeoff between detection rates and false alarm rates. Examples 
include signature systems in which the attack signature is matched or it isn’t 
and finite state approaches that raise an alert only if the underlying automata 
reaches an accepting state. Parametric systems have adjustable thresholds or are 
able to assign probabilities to alerts based, e.g., on a priori knowledge of signal 
and noise distributions^ or on quantifiable uncertainties in the detection process. 
The later is more likely to be a property of anomaly detectors, especially those 
based on population or individual statistical properties. 

If the ROC is an appropriate mechanism for presenting the results of an 
IDS evaluation in which non parametric, binary, decisions are made, the curve 
will consist of a single point that expresses the true positive and false positive 
percentages for the entire evaluation. The justification for drawing lines from 
the (0,0) coordinate to the point and from the point to the (1,1) coordinate is 
counterintuitive, imposing a probabilistic model where none is present. Nonethe- 
less, the lines are usually presented as shown, and we follow the tradition in our 
presentation. In the environment in which most IDS systems operate, the sig- 
nal percentage is very small® requiring very low false positive rates for useful 
detection as discussed in a recent paper by Axelsson [1]. 

^ Suppose that we know that 0.1% of the probes for finger service are precursors to an 
attack, while 99.9% are benign. How should we deal with this situation? Assuming 
that we can detect the probe 100% of the time, we can raise an alert with a 0.1% 
probability that it represents an attack every time a finger probe occurs. The ROC 
method requires us to classify each alert as either a successful detection or as a false 
alarm, but allows us to vary the threshold for the decision. As we vary the threshold 
from 0.0% to 0.1% the curve will show 100% detection rate and 100% false alarm rate 
since both attacks and false alarms are assigned a probability above the threshold. 
Above a threshold of 0.1%, both the detection rate and false alarm rate drop to 
0.0%. The problem here is that there is very little signal (attack instances) and a 
lot of noise (benign use of the finger service). In the absence of other factors that 
allow us to rehne the probability assigned to a given probe, the a priori distribution 
does not help and we are left with two choices; ignore finger probes (missing a small 
number of attack indicators) or raise a large number of false alarms. 

® This assumes that a small unit of analysis is chosen for computing the denominator 
of the false alarm rate. 
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As far as we are able to tell, none of the IDSs under evaluation use a like- 
lihood ratio estimator that considers both the signal distribution and the noise 
distribution as their decision criteria and little is known about the in vitro dis- 
tributions of intrusions and background activity that would make this fruitful. 
Most of the systems use only signal plus noise characteristics (signature based 
systems) or only noise characteristics (anomaly detection systems). The issue of 
timing systems that use a priori distributions implicitly by learning or training 
procedures has been discussed above. 

5.3 Errors per Unit Time 

The DISCEX paper uses a non-standard variation of the ROC presentation [9, 
Figure 4] that labels the horizontal axis with false alarms per day rather than 
percent false alarms. A search of the traditional ROC literature [14, 3] shows no 
mention of this formulation. It does appear, without comment or justification in 
the word spotting literature where it is usually [7], but not always [5], referred 
to as a ROC curve. 

Many of the corpora used for word spotting evaluations come from NIST, 
but researchers at NIST disavow the origin of the formulation saying that it was 
already in use when they entered the field. According to Alvin Martin of NIST, 
the earliest use of the formulation of which he is aware appeared in technical 
reports from Verbex Corporation in the late 1970s [10]. We were able to locate 
Stephen L. Moshier, one of the founders of Verbex and an author of some of the 
reports mentioned by Martin. He reported [11] that 

The military customer perceived that the user of a word spotter could 
cope with alarms (true or false) happening at a certain average rate but 
would becomes overloaded at a higher rate. So that is a model of the 
user, not a model of the incoming voice signals. 

One of the more powerful features of the ROC analysis is its ability to abstract 
away certain experimental variables such as the rates at which detections are 
performed. The primary factors that influence ROC results are the detector 
characteristics and the distributions of signals and noise. If the latter are realistic, 
the ROC presentation of the detector characteristics should have good predictive 
power for detector performance in similar environments. 

The pseudo-ROC, as we choose to call word spotting form, breaks these ab- 
stractions. By using incomparable units on the two axes, the results are strongly 
influenced by factors, such as data rate, that ought to be irrelevant. The form 
shown in the DISCEX paper is misleading for a number of reasons, notably be- 
cause of its failure to present the relevant information. Using the data set as 
provided for the evaluation, but reassigning values to the time stamps attached 
to the data items, the false alarm rate per unit time can be manipulated to any 
degree desired by varying the total duration represented by the dataset®. At 



Changing the timestamps so as to give the appearance that a five day dataset 
represented a single day would raise the false alarms per day by a factor of five. 
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the very least, the pseudo-ROCs presented by Lincoln Lab [9, Figure 4] should 
be labeled with the data rate on which the false alarm axis is based. This is 
especially true given that the data rates used in the evaluation appear to be 
unrealistically low. Using the evaluated systems on data streams with megabit 
rates might result in a ten to hundredfold increase in the false alarm rate when 
reported per unit time. 

6 Conclusions 

The Lincoln Lab evaluation is a major and impressive undertaking, but its ben- 
efits seem to be far out of proportion with its costs and impacts on research 
programs. It is not clear that the results of the evaluation predict deployed per- 
formance. Reducing the performance of these systems to a single number or to 
a small group of numbers is not particularly useful to the investigators since 
the numbers have no explanatory power. While detection and false alarm rates 
are important at a gross level and might be a basis for comparing commercial 
products, the research community would benefit from an evaluation approach 
that would provide constructive advice for improvement. 

It is hoped that this critique will either lead to a rethinking of the evaluation 
process and a recreation of it in a form that will help IDS development move for- 
ward. If the evaluation process cannot be modified so that it makes a substantial 
contribution to the improvement of the IDS state of the art, it would be better 
to abandon the evaluations for the present. Indeed, it appears that DARPA is 
currently rethinking its approach to evaluation in response to this and the other 
criticism^^ that it has received from other members of the IDS community. 
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Abstract. Eight sites participated in the second DARPA off-line intru- 
sion detection evaluation in 1999. Three weeks of training and two weeks 
of test data were generated on a test bed that emulates a small gov- 
ernment site. More than 200 instances of 58 attack types were launched 
against victim UNIX and Windows NT hosts. False alarm rates were 
low (less than 10 per day). Best detection was provided by network- 
based systems for old probe and old denial-of-service (DoS) attacks and 
by host-based systems for Solaris user-to-root (U2R) attacks. Best over- 
all performance would have been provided by a combined system that 
used both host- and network-based intrusion detection. Detection accu- 
racy was poor for previously unseen new, stealthy, and Windows NT 
attacks. Ten of the 58 attack types were completely missed by all sys- 
tems. Systems missed attacks because protocols and TCP services were 
not analyzed at all or to the depth required, because signatures for old 
attacks did not generalize to new attacks, and because auditing was not 
available on all hosts. 



1 Introduction 

Computer attacks launched over the Internet are capable of inflicting heavy dam- 
age due to increased reliance on network services and worldwide connectivity. 
It is difficult to prevent attacks by security policies, firewalls, or other mecha- 
nisms. System and application software always contains unknown weaknesses or 
bugs, and complex often unforeseen interactions between software components 
and/or network protocols are continually exploited by attackers. Intrusion detec- 
tion systems are designed to detect attacks that inevitably occur despite security 
precautions. 

Discussions of alternate approaches to intrusion detection are available 
in [1,6,16]. Some approaches detect attacks in real time and can be used to 
monitor and possibly stop an attack in progress. Others provide after-the-fact 
forensic information about attacks and can help repair damage, understand the 
attack mechanism, and reduce the possibility of future attacks of the same type. 
More advanced intrusion detection systems detect never-before-seen, new, at- 
tacks, while the more typical systems detect previously seen, known attacks. 
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The widespread deployment and high cost of both commercial and 
government-developed intrusion detection systems has led to an interest in eval- 
uating these systems. Evaluations that focus on algorithm performance are es- 
sential for ongoing research. They can contribute to rapid research progress by 
focusing efforts on difficult technical areas, they can produce common shared 
corpora or data bases which can be used to benchmark performance levels, and 
they make it easier for new researchers to enter a field and explore alternate 
approaches. A review of past intrusion detection evaluations is provided in [11]. 

The most comprehensive evaluations of intrusion detection systems per- 
formed to date were supported by DARPA in 1998 and 1999 [3,11,12]. These 
evaluations included research intrusion detection systems and attacks against 
UNIX, Windows NT, and Cisco Routers. They also used a relatively simple 
network architecture and background traffic designed to be similar to traffic on 
one Air Force base. The most recent 1999 evaluation included many novel as- 
pects [11]. Both detection and false alarm rates were carefully measured for more 
than 18 systems. More than 56 attack types included stealthy and novel new at- 
tacks were used to measure detection rates and weeks of background traffic were 
used to measure false alarm rates. In addition, a unique intrusion detection cor- 
pus was created that includes weeks of background traffic and hundreds of labeled 
and documented attacks. This corpus has been widely distributed and is being 
used as a benchmark for evaluating and developing intrusion detection systems. 
Both 1998 and 1999 DARPA evaluations included two components. An off-line 
component produced labeled benchmark corpora that were used simultaneously 
at many sites to develop and evaluate intrusion detection systems [11,12]. The 
complementary real-time component [3] assessed only systems that had real-time 
implementations using fewer attacks and hours instead of weeks of background 
traffic. The remainder of this paper focuses on the off-line component of the 1999 
evaluation. It provides a summary of this research effort, discusses details con- 
cerning the motivation and design of background traffic and stealthy attacks, and 
discusses an analytic approach that can be used to predict whether an intrusion 
detection system will miss a particular new attack. This paper complements [11], 
which provides further background and summary results for the 1999 off-line 
evaluation. Detailed descriptions of attacks in the 1999 evaluation are available 
in [2,8,9,13,14]. Further details and downloadable corpora are available at [14]. 

2 Overview of the 1999 Evaluation 

The 1999 off-line evaluation included three weeks of training data with back- 
ground traffic and labeled attacks to develop and tune intrusion detection sys- 
tems and two weeks of test data with background traffic and unlabeled attacks. 
Techniques originally developed during the 1998 evaluation [12] were extended 
to more fully analyze system behavior and cover more attack types. Figure 1 
shows the isolated test bed network used to generate background traffic and at- 
tacks. Scripting techniques, which extend the approaches used in [19], generate 
live background traffic similar to that which flows between the inside of one Air 
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INSIDE 



OUTSIDE 




Fig. 1. Block diagram of 1999 test bed 



Force base and the outside Internet. This approach was selected for the eval- 
uation because hosts can be attacked without degrading operational Air Force 
systems and because corpora containing background traffic and attacks can be 
widely distributed without security or privacy concerns. A rich variety of back- 
ground traffic that looks as if it were initiated by hundreds of users on thousands 
of hosts is generated in the test bed. The left side of Figure 1 represents the in- 
side of the fictional Eyrie Air Force base created for the evaluations and the right 
side represents the outside Internet. Automated attacks were launched against 
four inside UNIX and Windows NT victim machines (Linux 2.0.27, SunOS 4.1.4, 
Sun Solaris 2.5.1, Windows NT 4.0) and a Cisco 2514 router. More than 200 in- 
stances of 58 different attacks were embedded in three weeks of training data and 
two weeks of test data. Inside and outside machines labeled sniffer in Figure 1 
run a program named tcpdump [10] to capture all packets transmitted over the 
attached network segments. This program was customized to open a new output 
data file after the current active output file size exceeds 1 Gbytes. The status 
line printed when tcpdump was terminated each day never indicated that any 
packets were dropped. Data collected to evaluate intrusion detection systems 
include this network sniffing data, Solaris Basic Security Module (BSM) audit 
data collected from the Solaris host, Windows NT audit event logs collected from 
the Windows NT host, nightly listings of all files on the four victim machines, 
and nightly dumps of security-related files on all victim machines. 

New features in the 1999 off-line evaluation include the Windows NT victim 
machine and associated attacks and audit data. These were added due to in- 
creased reliance on Windows NT systems by the military. Inside attacks, inside 
sniffer data to detect these attacks, and stealthy attacks were also added due the 
dangers posed by inside attacks and an emphasis on sophisticated attackers who 
can carefully craft attacks to look like normal traffic. In addition, an analysis of 
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Fig. 2. Average connections per day for dominant TCP services 



misses and high-scoring false alarms was performed for each system to determine 
why systems miss specific attacks. 

The 1999 evaluation was designed primarily to measure the ability of systems 
to detect new attacks without first training on instances of these attacks. The 
previous 1998 evaluation had demonstrated that systems could not detect new 
attacks well. The new 1999 evaluation was designed to evaluate enhanced systems 
which can detect new attacks and to analyze why systems miss new attacks. 
Many new attacks were thus developed and only examples of a few of these were 
provided in training data. 



3 Test Bed Network and Background Traffic 

The test bed architecture shown in Figure 1 is a basic foundation that is be- 
coming more complex as the evaluations progress. It was designed to simplify 
network administration and to support attack and background traffic generation 
and also instrumentation required to collect input data required by intrusion de- 
tection systems. This fiat network architecture is not representative of an Air 
Force base. It is a minimal network designed to support intrusion detection 
systems that desired to participate in 1998 and 1999, attack types of interest, 
and most of the network traffic types seen across many Air Force bases. Future 
evaluations may include more complex networks including firewalls and other 
protective devices. 

Background traffic was generated in the test bed for a variety of reasons. 
This traffic made it possible to measure baseline false alarm rates of evaluated 
intrusion detection systems and to deter the development of limited non-robust 
intrusion detection systems that simply trigger when a particular traffic type 
occurs. It also led to reasonably high data rates and a fairly rich set of traffic 
types that exercise traffic handling and analysis capabilities of network analysis 
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and intrusion detection tools tested with evaluation corpora. Finally, the synthe- 
sized nature of the traffic allows widespread and relatively unrestricted access to 
the evaluation corpora. False alarm rates measured with the evaluation corpus 
may not represent operational false alarm rates at any location. As noted in [17], 
network traffic varies widely with location and time. This implies that it may be 
difficult to predict the false alarm rates at operational sites from false alarm rates 
measured during any evaluation because traffic characteristics, including details 
that affect false alarm rates, are likely to differ widely from those used in the 
evaluation. The approach taken in the test bed was to generate realistic traffic 
that is roughly similar to traffic measured on one Air Force base in early 1998. In 
addition, details of this traffic (e.g. the frequency of occurrence of words in mail, 
telnet sessions, and FTP file transfers) were designed to produce false alarm rates 
similar to operational rates obtained in 1998 using the Air Force ASIM intrusion 
detection system (ASIM is similar to the Network Security Monitor described 
in [-5]). False alarm rates measured with this traffic can be used to benchmark 
or compare intrusion detection systems on reference evaluation background traf- 
fic corpora. They may not, however, be representative of false alarm rates on 
operational data. Supplementary measurements using restricted-access data are 
necessary to determine operational false alarm rates. Traffic characteristics of 
test bed background traffic that were similar to characteristics of measured Air 
Force base traffic include the following: 

1 . The overall traffic level in connections per day. 

2. The number of connections per day for the dominant TCP services. 

3. The identity of many web sites that are visited from internal users. 

4. The average time-of-day variation of traffic as measured in 15-minute inter- 
vals. 

5. The general purpose of telnet sessions. 

6. The frequency of usage of UNIX commands in telnet sessions. 

7. The use of the UNIX time command to obtain an accurate remote time 
reference. 

8. The frequency of occurrence of ASIM keywords in telnet sessions, mail mes- 
sages, and files downloaded using FTP. 

9. The frequency of occurrence of users mistyping their passwords. 

10. Inclusion of an SQL database server that starts up automatically after a user 
telnets to remote server. 

Custom software automata in the test bed simulate hundreds of program- 
mers, secretaries, managers, and other types of users running common UNIX 
and Windows NT application programs. Automata interact with high-level user 
application programs such as Netscape, lynx, mail, ftp, telnet, ssh, ire, and ping 
or they implement clients for network services such as HTTP, SMTP, and POPS. 
Low-level TCP/IP protocol interactions are handled by kernel software and are 
not simulated. The average number of background-traffic bytes transmitted per 
day between the inside and outside of this test bed is roughly 411 Mbytes per 
day, with most of the traffic concentrated between 8:00 AM and 6:00 PM. The 
dominant protocols are TCP (384 Mbytes), UDP (26 Mbytes), and ICMP (98 
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Table 1. Major types of network services and automaton session types generated 
to create background traffic in the test bed 
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col 


Session T>pe 


Summary 


Finger 


Remote Work 


Verily remote user nante using Jinger before sending 
email. 


FTP 


FTP 


Gel/Put files on internal Eyrie FTP servers. 


HTTP 


Lynx Eyrie 
Browser 


Browse Eyrie internal web servers using UNIX com- 
mand-line lynx browser. 


Eyrie 

Browsers 


Multi-browser automaton emulates users accessing 
Eyrie web sites with various browscis. 


Internet 

Browsers 


Multi-browser automaton emulates users accessing 
Inteniel web sites with various browseis. 


Netscape 
Internet Browser 


Windows NT user accesses external web sites using 
Netscape browser. 


ICMP 


Remote Work 


Verify remote host is on line using ping. 


IRC 


IRC 


Users participate in an IRC chat-room, external to Eyrie. 


POP3 


POP3 


Internal users use POP3 to access their email from 
External mail servers. 


SMTP 


Sendmail 


Individual, group, and global email messages to and 
from all users. 


SSH 


Remote Work 


External users use sdi to connect to internal Eyrie hosts 
and perfonn daily, work-related, tasks. 


SNMP 


SNMP 


External AF host monitors Eyrie router and hosts. 


Telnet 


Remote Work 


External users telnet to internal Eyrie hosts to perform 
daily, work-related, tasks. 


Mailread 


Users telnet to internal and external hosts to check their 
email usitig UNIX mail program. 


SQL 


Users telnet to an internal Eyrie SQL server and query 
the database. 


Time 


Time 


Periodic query to external time reference site. 



Kbytes). These traffic rates are low compared to current rates at some large 
commercial and academic sites. They are representative of 1998 Air Force data 
and they also lead to sniffed data file sizes that can still be transported over 
the Internet without practical difficulties. Figure 2 shows the average number 
of connections per day for the most common TCP services. As can be seen, 
web traffic dominates but many other types of traffic are generated which use a 
variety of services. 

Table 1 shows the many types of user sessions generated by automata and 
the types of network traffic these sessions create. As can be seen, user automata 
send and receive mail, browse web sites, send and receive files using the FTP 
protocol, use telnet and ssh to log into remote computers and perform work, 
monitor the router remotely using SNMP, and perform other tasks. For example. 
Table 1 shows that four different automata are used to generate HTTP traffic. 
The lynx command-line browser is used during telnet and console sessions to 
access internal Eyrie web sites, a multi-browser automaton which emulates many 
types of browsers including Netscape Navigator and Internet Explorer is used to 
browse both internal and external web sites, and a JavaScript browser that runs 
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Fig. 3. Number of HTTP connections measured in 15 minute intervals generated 
by the four types of web automaton during Tuesday of the third week of training 



inside Netscape Navigator browses external web sites from the internal Windows 
NT host. 

Table 1 also shows that three automata are used to generate telnet sessions. 
First, remote programmers, secretaries, and administrators connect into internal 
Eyrie machines to work throughout the day using telnet or SSH. Characteristics 
of these work sessions including the frequency of occurrence of different UNIX 
commands issued, files accessed, the number of sessions per day, and the start 
time and duration of sessions are assigned probabilistically depending on the 
user type. A second telnet automaton simulates users who telnet to hosts to read 
and respond to mail using the UNIX mail program. The final telnet automaton 
simulates users who access an SQL database on an internal database machine. 
This machine automatically opens an SQL database server program, instead of a 
shell, after successful telnet logins. In addition to automatic traffic, the test bed 
allows human actors to generate background traffic and attacks when the traffic 
or attack is too complex to automate. For example, human actors performed 
attacks that included remote X- Windows Netscape browser displays. 

Traffic varies over each simulation day to produce roughly the same average 
overall traffic rates in 15-minute intervals as measured in one week of operational 
Air Force traffic. Figure 3 shows the number of HTTP connections generated by 
the four browsing automata from Table 1 in one day of test bed traffic. Start 
times of browsing sessions are chosen using a Poisson process model with a time- 
dependent rate parameter and times between browsing actions within a session 
also have independent exponential distributions. Each browsing session accesses 
from 1 to 50 web pages. The model of human typing provided in expect is used 
for typing responses in telnet and other sessions where users normally provide 
responses from a keyboard. As can be seen in Figure 3, traffic rates are highest 
during the middle of the 8:00 AM to 6:00 PM workday and low after these hours. 
These plots vary with time in a similar manner for telnet and other session types, 
except the maximum number of sessions is scaled down. 
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Table 2. Probe and Denial of Service (DoS) attacks 





Solaris 


SunOS 


NT 


Linux 


.All 


Probe 

(37) 


portsweep 

Queso 


po/Tsweep 

queso 


ntinfoscan 

portsweep 


isoomain 

mscan 

portsweep 

queso 

satan 


iltegal-snitler 

ipsweep 

portsweep 


DoS 

(65) 


neptune 
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selfping 
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tcpreset 
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arpoison 
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mailbomb 
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processtable 


arppoison 

crashiis 

dosnuire 
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“apacFie2 

arppoison 

back 

mailbomb 

neptune 
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processtable 

smurf 

tcpreset 

teardrop 

udpstorm 





4 Attacks 

Twelve new Windows NT attacks were added in 1999 along with stealthy ver- 
sions of many 1998 attacks, new inside console-based attacks, and six new UNIX 
attacks. The 56 different attack types shown in Tables 2 and 3 were used in 
the evaluation. Attacks in normal font in these tables are old attacks from 1998 
executed in the clear (114 instances). Attacks in italics are new attacks devel- 
oped for 1999 (62 instances), or stealthy versions of attacks used in 1998 (35 
instances). Details on attacks including further references and information on 
implementations are available in [2,8,9,13,14]. Five major attack categories and 
the attack victims are shown in Tables 2 and 3. Primary victims listed along the 
top of these tables are the four inside victim hosts, shown in the gray box of 
Figure 1, and the Cisco router. In addition, some probes query all machines in a 
given range of IP addresses as indicated by the column labeled “all” in Table 2. 

The upper row of Table 2 lists probe or scan attacks. These attacks automat- 
ically scan a network of computers or a DNS server to find valid IP addresses 
(ipsweep, Isdomain, mscan), active ports (portsweep, mscan), host operating sys- 
tem types (queso, mscan), and known vulnerabilities (satan). All of these probes 
except two (mscan and satan) are either new in 1999 (e.g. ntinfoscan, queso, 
illegalsniffer) or are stealthy versions of 1999 probes (e.g. portsweep, ipsweep). 
Probes are considered stealthy if they issue ten or fewer connections or packets 
or if they wait longer than 59 seconds between successive network transmissions. 
The new “illegal-sniffer” attack is different from the other probes. During this 
attack, a Linux sniffer machine is installed on the inside network running the 
tcpdump program in a manner that creates many DNS queries from this new 
and illegal IP address. 

The second row of Table 2 contains denial of service (DoS) attacks designed 
to disrupt a host or network service. New 1999 DoS attacks crash the Solaris 
operating system (selfping), actively terminate all TCP connections to a specific 
host (tcpreset), corrupt ARP cache entries for a victim not in others caches 
(arppoison), crash the Microsoft Windows NT web server (crashiis), and crash 
Windows NT (dosnuke). 
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The first row of Table 3 contains Remote to Local (R2L) attacks. In these 
attacks, an attacker who does not have an account on a victim machine gains 
local access to the machine (e.g. guest, diet), exfiltrates files from the machine 
(e.g. ppmacro), or modifies data in transit to the machine (e.g. framespoof). 
New 1999 R2L attacks include an NT PowerPoint macro attack (ppmacro), 
a man-in-the middle web browser attack (framespoof), an NT trojan-installed 
remote-administration tool (netbus), a Linux trojan SSH server (sshtrojan), and 
a version of a Linux FTP file access-utility with a bug that allows remote com- 
mands to run on a local machine (neftp). 

The second row of Table 3 contains user to root (U2R) attacks where a local 
user on a machine is able to obtain privileges normally reserved for the UNIX 
super user or the Windows NT administrator. All five NT U2R attacks are new 
this year and all other attacks except one (xterm) are versions of 1998 UNIX 
U2R attacks that were redesigned to be stealthy to network-based intrusion 
detection systems evaluated in 1998. These stealthy attacks are described below. 
The bottom row in Table 3 contains Data attacks. The goal of a Data attack 
is to exfiltrate special files, which the security policy specifies should remain on 
the victim hosts. These include “secret” attacks where a user who is allowed to 
access the special files exfiltrates them via common applications such as mail 
or FTP, and other attacks where privilege to access the special files is obtained 
using a U2R attack (ntfsdos, sqlattack). Note that an attack could be labeled 
as both a U2R and a Data attack if one of the U2R attacks was used to obtain 
access to the special files. The “Data” category thus specifies the goal of an 
attack rather than the attack mechanism. 

4.1 Stealthy U2R Attacks 

UNIX U2R attacks were made stealthy to network-based intrusion detection 
systems using a variety of techniques designed to hide attack-specific keywords 
from network-based sniffers [2,13]. Most stealthy U2R attacks included the com- 
ponents shown by the five columns in Figure 4. Attack scripts were first encoded, 
transported to the victim machine, and then decoded and executed. Actions such 
as altering or accessing secret or security-related files were performed and the 
attacker then removed files created for the attack and restored original permis- 
sions of altered or accessed files to clean up. The dark filled in actions in Figure 
4 show one particular stealthy attack. In this attack, the dear-text attack script 
is encoded by “character stuffing” where extra unique characters (e.g.“AA”) 
are added after every original character, the attack script is transported to the 
victim machine using FTP, the attack script is decoded using vi (not shown, 
but implicit), attack execution is hidden by generating screens full of chaff text 
directed to the standard output from a background process, and the attacker 
changes file permission on a secret file, displays the file, and then restores file 
permissions back to original settings and erases the attack script. As can be 
seen from Figure 4, there are many other possible variants of stealthy attacks. 
Five approaches were used to encode/decode and transport attack scripts and 
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Table 3. Remote to Local (R2L), User to Root (U2R), and Data attacks 
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to execute these scripts. The encode action “Octal Characters” refers to encod- 
ing binary files using the C print/ octal backslash notation and then decoding 
the binary file using the tcsh builtin echo command. The execute action “Shell 
Variables” refers to encoding shell commands using shell variables to obscure 
the commands that are issued. The execute action “Delay Execution” refers to 
using cron or at to execute scripts at a later time after the session that created 
the attack script and “Multiple Sessions” refers to downloading, decoding, and 
running the attack script over multiple sessions. Further details and examples of 
other actions are available in [2,13]. 

Stealthy techniques that rely on packet fragmentation and other forms of 
packet manipulation [18] were implemented as part of the 1999 evaluation. Time 
constraints and the variety of victim operating systems used precluded exten- 
sive experimentation with these approaches. Preliminary exploratory results are 
provided in [2]. 

5 Participants and Scoring 

Eight research groups participated in the evaluation using a variety of approaches 
to intrusion detection. Papers by these groups describing high-performing sys- 
tems are provided in [4,7,15,20,21,22,23]. One requirement for participation in 
the evaluation was the submission of a detailed system description that was 
used for scoring and analysis. System descriptions described the types of attacks 
the system was designed to detect, data sources used, features extracted, and 
whether optional attack identification information was provided as an output. 
Most systems used network sniffer data to detect Probe and DoS attacks against 
all systems [7,15,21,23] or BSM Solaris host audit data to detect Solaris R2L and 
U2R attacks [4,15,23]. Two systems produced a combined output from both net- 
work sniffer data and host audit data [15,23]. A few systems used network sniffer 
data to detect R2L and U2R attacks against the UNIX victims [15,23]. One sys- 
tem used NT audit data to detect U2R and R2L attacks against the Windows 
NT victim [20] and two systems used BSM audit data to detect Data attacks 
against the Solaris victim [15,23]. A final system used information from a nightly 
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Fig. 4. Possible paths to generate stealthy user to root (U2R) attacks. Each 
attack requires selection of one or more of the alternate approaches shown in 
each column 



file system scan to detect R2L, U2R, and Data attacks against the Solaris vic- 
tim [22]. The software program that performs this scan was the only custom 
auditing tool used in the evaluation. A variety of approaches were employed in- 
cluding expert systems that use rules or signatures to detect attacks, anomaly 
detectors, pattern classifiers, recurrent neural networks, data mining techniques, 
and a reasoning system that performs a forensic analysis of the Solaris file sys- 
tem. 

Three weeks of training data, composed of two weeks of background traffic 
with no attacks and one week of background traffic with a few attacks, were 
provided to participants from mid May to mid July 1999 to support system 
tuning and training. Only five weekdays of traffic were provided for each week. 
Locations of attacks in the training data were clearly labeled. Two weeks of 
unlabeled test data were provided from late September to the middle of October. 
Participants downloaded this data from a web site, processed it through their 
intrusion detection systems, and generated putative hits or alerts at the output of 
their intrusion detection systems. Lists of alerts were due back by early October. 
In addition, participants could optionally return more extensive identification 
lists for each attack. 

A simplified approach was used in 1999 to label attacks and score alerts and 
new scoring procedures were added to analyze the optional identification lists. In 
1998, every network TCP/IP connection, UDP packet, and ICMP packet was la- 
beled, and participants determined which connections and packets corresponded 
to attacks. Although this approach pre-specifies all potential attack packets and 
thus simplifies scoring and analysis, it can make submitting alerts difficult be- 
cause aligning alerts with the network connections and packets that generate 
alerts is often complex. In addition, this approach cannot be used with inside 
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attacks that generate no network traffic. In 1999, a new simplified approach was 
adopted. Each alert only had to indicate the date, time, victim IP address, and 
score for each putative attack detection. An alert could also optionally indicate 
the attack category. This was used to assign false alarms to attack categories. 
Putative detections returned by participants were counted as true “hits” or true 
detections if the time of any alert occurred during the time of any attack segment 
and the alert was for the correct victim IP address. Alerts that occur outside 
all attack segments were counted as “misses” or false alarms. Attack segments 
generally correspond to the duration of all network packets and connections gen- 
erated by an attack and to time intervals when attack processes are running on a 
victim host. To account for small timing inconsistencies across hosts, an extra 60 
seconds leeway was typically allowed for alerts before and after the end of each 
attack segment. The analysis of each system only included attacks which that 
system was designed to detect, as specified in the system description. Systems 
werent penalized for missing attacks they were not designed to detect and false 
alarms that occurred during segments of out-of-spec attacks were ignored. 

The score produced by a system was required to be a number that increases 
as the certainty of an attack at the specified time increases. All participants re- 
turned numbers ranging between zero and one, and many participants produced 
binary outputs (Os and Is only). If alerts occurred in multiple attack segments 
of one attack, then the score assigned to that attack for further analysis was 
the highest score in all the alerts. Some participants returned optional identifi- 
cation information for attacks. This included the attack category, the name for 
old attacks selected from a list of provided names, and the attack source and 
destination IP addresses, start time, duration, and the ports/services used. This 
information was analyzed separately from the alert lists used for detection scor- 
ing. Results in this paper focus on detection results derived from the required 
alert lists. Information on identification results is provided in [11]. 

Attack labels were used to designate attack segments in the training data 
and also to score lists of alerts returned by participants. Attack labels were 
provided using list files similar to those used in 1998, except a separate list file 
was provided for each attack specifying all segments of that attack. Entries in 
these list files include the date, start time, duration, a unique attack identifier, 
the attack name, source and destination ports and IP addresses, the protocol, 
and details concerning the attack. Details include indications that the attack is 
clear or stealthy, old or new, inside or outside, the victim machine type, and 
whether traces of the attack occur in each of the different data types that were 
collected. Attack list files are available at [14]. 

6 Results 

An initial analysis was performed to determine how well all systems taken to- 
gether detect attacks regardless of false alarm rates. The best system was selected 
for each attack as the system that detects the most instances of that attack. The 
detection rate for these best systems provides a rough upper bound on compos- 
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Table 4. Poorly detected attacks where the best system for each attack detects 
half or fewer of the attack instances 
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ite system performance. Thirty seven of the 58 attack types were detected well 
by this composite system, but many stealthy and new attacks were always or 
frequently missed. Poorly detected attacks for which half or more of the attack 
instances were not detected by the best system are listed in Table 4. This table 
lists the attack name, the attack category, details concerning whether the attack 
is old, new, or stealthy, the total number of instances for this attack, and the 
number of instances detected by the system which detected this attack best. 
Table 4 contains 21 attack types and is dominated by new attacks and attacks 
designed to be stealthy to 1998 network-based intrusion detection systems. All 
instances of 10 of the attack types in Table 4 were totally missed by all systems. 
These results suggest that the new systems developed for the 1999 evaluation 
still are not detecting new attacks well and that stealthy probes and U2R attacks 
can avoid detection by network-based systems. 

Further analyses evaluated system performance at false alarm rates in a spec- 
ified range. The detection rate of each system at different false alarm rates can 
be determined by lowering a threshold from above 1.0 to below 0.0, counting 
the detections with scores above the threshold as hits, and counting the number 
of alerts above the threshold that do not detect attacks as false alarms. This 
results in one or more operating points for each system which trade off false 
alarm rate against detection rate. It was found that almost all systems, except 
some anomaly detection systems, achieved their maximum detection accuracy at 
or below 10 false alarms per day on the 1999 corpus. These low false alarm rates 
were presumably due to the low overall traffic volume, the relative stationarity of 
the traffic, and the ability to tune systems to reduce false alarms on three weeks 
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of training data. In the remaining presentation, the detection rate reported for 
each system is the highest detection rate achieved at or below 10 false alarms 
per day on the two weeks of test data. 

Table 5 shows average detection rates at 10 false alarms per day for each 
attack category and victim type. This table provides overall results and does 
not separately analyze old, new, and stealthy attacks. The upper number in a 
cell, surrounded by dashes, is the number of attack instances in that cell and the 
other entries provide the percent correct detections for all systems with detec- 
tion rates above 40% in that cell. A cell contains only the number of instances 
if no system detected more than 40% of the instances. Only one entry is filled 
for the bottom row because only probe attacks were against all the victim ma- 
chines and the SunOS/Data cell is empty because there were no Data attacks 
against the SunOS victim. High-performance systems listed in Table 5 include 
rule-based expert systems that use network sniffing data and/or Solaris BSM 
audit data (Expert-1 through Expert-3 [15,23,21]), a data mining system that 
uses network sniffing data (Dmine [7]), a pattern classification approach that 
uses network sniffing data (Pclassify), an anomaly detection system which uses 
recurrent neural networks to analyze system call sequences in Solaris BSM audit 
data (Anomaly [4]), and a reasoning system which performs a nightly forensic 
analysis of the Solaris file system (Forensics [22]). 

No one approach or system provides the best performance across all cate- 
gories. The best performance is provided for probe and denial of service attacks 
for systems that use network sniffer data and for U2R and Data attacks against 
the Solaris victim for systems that use BSM audit data. Detection rates for U2R 
and Data attacks are generally poor for SunOS and Linux victims where exten- 
sive audit data is not available. Detection rates for R2L, U2R, and Data attacks 
are poor for Windows NT, which was included in the evaluation for the first 
time this year. 

Figure 5 shows the performance of the best intrusion detection system in 
each attack category at a false alarm rate of 10 false alarms per day. The left 
chart compares the percentage of attack instances detected for old-clear and 
new attacks and the right chart compares performance for old-clear and stealthy 
attacks. The numbers in parentheses on the horizontal axis below the attack 
category indicate the number of instances of attacks of different types. For ex- 
ample, in Figure 3A, there were 49 oldclear and 15 new denial-of-service attacks. 
Figure 3A demonstrates that detection of new attacks was much worse than de- 
tection of old-clear attacks across all attack categories, and especially for DoS, 
R2L, and U2R attacks. The average detection rate for old-clear attacks was 72% 
and this dropped to 19% for new attacks. Figure 3B demonstrates that stealthy 
probes and U2R attacks were much more difficult to detect for network-based 
intrusion detection systems that used sniffing data. User-to-root attacks against 
the Solaris victim, however, were accurately detected by host-based intrusion 
detection systems that used BSM audit data. 

Attacks are detected best when they produce a consistent “signature,” trace, 
or sequence of events in tcpdump data or in audit data that is different from 
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Table 5. Percent attack instances detected for systems with a detection rate 
above 40% in each cell and at false alarm rates below 10 false alarms per day 
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sequences produced for normal traffic. A detailed analysis by participants demon- 
strated that attacks were missed for a variety of reasons. Systems which relied on 
rules or signatures missed new attacks because signatures did not exist for these 
attacks, and because existing signatures did not generalize to variants of old 
attacks, or to new and stealthy attacks. For example ncftp a ls_domain attacks 
were visible in tcpdump data, but were missed because no rules existed to detect 
these attacks. Stealthy probes were missed because hard thresholds in rules were 
set to issue an alert only for more rapid probes, even though slow probes often 
provided as much information to attackers. These thresholds could be changed to 
detect stealthy probes at the expense of generating more false alarms. Stealthy 
U2R attacks were missed by network-based systems because attack actions were 
hidden in sniffer data and rules generated for clear versions of these attacks no 
longer applied. Many of the Windows NT attacks were missed due to lack of 
experience with Windows NT audit data and attacks. A detailed analysis of the 
Windows NT attacks [9] indicated that all but two of these attacks (ppmacro, 
framespoof) can be detected from the 1999 NT audit data using attack-specific 
signatures which generate far fewer than 10 false alarms per day. 
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A) Old-Clear Versus New B) Old-Clear Versus Stealthy 
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Fig. 5. Comparison of detection accuracy at 10 false alarms per day for (A) 
Old-Clear versus New attacks and (B) Old-Clear versus stealthy attacks 



7 Predicting when New Attacks will Be Detected 

Many network sniffer-based intrusion detection systems missed attacks because 
particular protocols or services were not monitored or because services were not 
analyzed to the required depth. This is illustrated in Figure 6. The horizontal 
axis in this figure shows the protocols and services that were used for many of the 
probe and DoS attacks and the vertical axis shows the depth of analysis required 
to reliably determine the action performed by an attack. Attacks near the top 
of Figure 6 require only lowlevel analysis of single or multiple packet headers. 
Attacks near the bottom of Figure 6 require understanding of the protocol used 
to extract the connection content and highlevel analysis of the content to deter- 
mine the action performed. Well-known attacks can be detected at lower levels 
than shown when the attack produces a signature or trace at a lower level that is 
unique from background traffic. This approach is used in most signature-based 
intrusion detection systems. Determining the intended action of a new attack, 
however, requires the depth of analysis shown. 

Attack names surrounded by white ovals in Figure 6 were detected well, 
while attacks surrounded by dark ovals were not. For example, many systems 
missed the “ARP Poison” attack on the bottom left because the ARP protocol 
was not monitored or because the attackers duplicate responses to arp-who- 
has requests were not detected. Many systems also missed the Illegal Sniffer 
and LSJDOMAIN attacks on the left middle because the DNS service was not 
monitored or because DNS traffic was not analyzed to determine either when an 
“Is” command is successfully answered by a DNS server or when a DNS request 
is sent from a new IP address. Many systems also missed the “SELF-PING” 
attack because telnet sessions were not reconstructed and commands issued in 
telnet sessions were not analyzed. Many of the attacks that were detected well 
required simpler high-level analysis of packet headers. For example, the “LAND” 
attack includes a UDP packet with the same source and destination IP address 
and the “TEARDROP” attack includes a mis-fragmented UDP packet. Other 
attacks that were detected well required sequential analysis of multiple packets 
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Fig. 6. Probe and DoS attacks displayed to show the services and protocols used 
and the maximum depth of analysis of network traffic required to reliably detect 
attacks. Attacks in white ovals were detected well by network-based systems, 
attacks in dark ovals were not 

or deeper analysis of a particular protocol. For example, the “SATAN” and 
“NTINFOSCAN” attacks include a large variety of connections that occur in a 
short time interval, as do non-stealthy “IP SWEEPS” and “PORT SWEEPS” . 

The attack analysis shown in Figure 6 illustrates how two pieces of informa- 
tion are required to predict whether a new attack will be missed by network- 
based systems. Evidence of the attack or knowledge of where the attack manifests 
itself in data sources and also knowledge of input data and features used by the 
intrusion detection system are required. This analysis can be extended to other 
types of attacks and to host-based systems by analyzing the evidence an attack 
leaves on the victim host in audit records, log files, file system access times, and 
other locations. The general rule is that attacks will be missed if no evidence of 
the attack is available in data analyzed by the intrusion detection system or if 
necessary features are not extracted from this data. This may occur for many 
reasons. The required host-based data may not be available, network sensors 
may be in the wrong location to record attack trace components, required pro- 
tocols or services may not be analyzed, a new attack may require a novel type of 
feature extraction which is not yet included, or a stealthy attack may leave no 
traces in information analyzed. If traces of an attack are processed by an intru- 
sion detection system, then the attack may or may not be detected. Performance 
depends on the overlap with normal input features and details of the intrusion 
detection system. The analysis described above requires attack trace informa- 
tion and detailed intrusion detection system descriptions. It can be used as a 
preliminary analysis to determine which attacks an intrusion detection system 
may detect and can reduce the necessity of expensive experimentation. Network 
attack traces and system descriptions are available on the Lincoln Laboratory 
web site and included as part of the 1999 DARPA Intrusion Detection Evalu- 
ation corpus [14]. The traces list all network packets generated by each attack 
instance. 
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8 Discussion 

The DARPA 1999 off-line intrusion detection evaluation successfully evaluated 
18 intrusion detection systems from 8 sites using more than 200 instances of 58 
attack types embedded in three weeks of training data and two weeks of test 
data. Attacks were launched against UNIX and Windows NT hosts and a Cisco 
router. Best detection was provided by network-based systems for old probe 
and old denial of service attacks and by host-based systems for Solaris user-to- 
root attacks launched either remotely or from the local console. A number of 
sites developed systems that detect known old attacks by searching for signa- 
tures in network sniffer data or Solaris BSM audit data using expert systems or 
rules. These systems detect old attacks well when they match known signatures, 
but miss many new UNIX attacks, Windows NT attacks, and stealthy attacks. 
Promising capabilities were provided by Solaris host-based systems which de- 
tected console-based and remote-stealthy U2R attacks, by anomaly detection 
systems which could detect some U2R and DoS attacks without requiring sig- 
natures, and by a host-based system that could detect Solaris U2R and R2L 
attacks without using audit information but by performing a forensic analysis of 
the Solaris file system. 

A major result of the 1998 and 1999 evaluations is that current research in- 
trusion detection systems miss many new and stealthy attacks. Despite the focus 
in 1999 on developing approaches to detect new attacks, all systems evaluated 
in 1999 completely missed 10 out of 58 attack types and, even after combining 
output alerts from all systems, 23 attack types were detected poorly (half or 
fewer instances of an attack type detected) . Detailed analyses of individual sys- 
tems indicated that attacks were missed for many reasons. Input data sources 
that contained evidence of attacks were sometimes not analyzed or they wer- 
ent analyzed to the required depth and rules, thresholds, or signatures created 
for old attacks often did not generalize to new attacks. This result is relatively 
independent of evaluation details because it depends only on attack traces and 
an analysis of why attacks were missed and how systems operate. An analysis 
of why attacks were missed suggested an analytic approach that can be used to 
predict whether an intrusion detection system will miss a particular new attack. 
It requires detailed attack traces and system descriptions to determine whether 
components of attack traces are contained in the inputs to an intrusion detection 
system and whether necessary features are extracted from these inputs. This an- 
alytic approach may be useful for designing future evaluations and reducing the 
need for experimentation. 

False alarm rate results of the 1999 evaluation should be interpreted within 
the context of the test bed and background traffic used. The evaluation used 
a simple network topology, a non-restrictive security policy, a limited number 
of victim machines and intrusion detection systems, stationary and low-volume 
background traffic, lenient scoring, and extensive instrumentation to provide 
inputs to intrusion detection systems. Most systems had low false alarm rates 
(well below 10 false alarms per day). As noted above, these low rates may be 
caused by the use of relatively low volume background traffic with a time varying. 
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but relatively fixed proportion of different traffic types and by the availability of 
training data to tune or train systems. 

Extensions to the current evaluation are planned to verify false alarm rates 
using operational network traffic and a small number of high-performing sys- 
tems. Operational measurements will also be made to update traffic statistics 
and traffic generators used in the test bed. Further evaluations are also required 
to explore performance with commercial and updated research intrusion detec- 
tion systems, with more complex network topologies, with a wider range of at- 
tacks, and with more complex background traffic. In addition, other approaches 
to making attacks stealthy should be explored including low-level packet modi- 
fications (e.g. [18]) and attacks which remove evidence from Windows NT and 
Solaris BSM audit records and other system audit logs before terminating. 

Comprehensive evaluations of DARPA research systems have now been per- 
formed in 1998 and 1999. These evaluations take time and effort on the part of 
the evaluators and the participants. They have provided benchmark measure- 
ments that do not now need to be repeated again until system developers are 
able to implement many desired improvements. The current planned short-term 
focus in 2000 is to provide assistance to intrusion detection system developers to 
advance their systems and not to evaluate performance. System development can 
be expedited by providing descriptions, traces, and labeled examples of many 
new attacks, by developing threat and attack models, and by carefully evaluating 
COTS systems to determine where to focus research efforts. 

A number of research directions are suggested by 1999 results. First, re- 
searchers should focus on anomaly detection and other approaches that have 
the potential of detecting new attacks. Second, techniques should be developed 
to process Windows NT audit data. Third, host-based systems shouldnt rely 
exclusively on C2-level audit data such as Solaris BSM data or NT audit data. 
Instead other types of host and network input features should also be explored. 
These could be provided by new system auditing software, by firewall or router 
audit logs, by SNMP queries, by software wrappers, by commercial intrusion de- 
tection system components, by forensic analysis of file-system changes as in [22] , 
or by application-specific auditing. Fourth, research efforts should not overlap 
but should provide missing functionality. Finally, a greater breadth of analysis 
is required including a wider range of protocols and services. 
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Abstract. After more than a decade of development, there are 
now many commercial and non-commercial intrusion-detection systems 
(IDSes) available. However, they tend to generate false alarms at high 
rates while overlooking real threats. The results described in this paper 
have been obtained in the context of work that aims to identify means for 
supporting the analysis, evaluation, and design of large-scale intrusion- 
detection architectures. We propose a practical method for evaluating 
IDSes and identifying their strengths and weaknesses. Our approach 
shall allow us to evaluate IDSes for their capabilities, unlike existing 
approaches that evaluate their implementation. It is furthermore shown 
how the obtained knowledge can be used to analyze and evaluate an IDS. 



1 Introduction 

In the past few years, an increasing number of intrusion-detection systems 
(IDSes) have become available [1]. This development has been driven by the 
growing number of computer security incidents [2, 3, 4, 5, 6, 7, 8] that demonstrate 
the need for organizations to protect their network against adversaries [9]. The 
issue of protecting networks and making them secure and reliable has been ad- 
dressed in many publications that have analyzed the problems and made per- 
tinent recommendations [10,11]. Intrusion detection (ID) is widely regarded as 
being part of the solution for protecting today’s networks. However, IDSes may 
fail by generating false alarms or not recognizing attacks. This, together with 
the fact that today’s networks are not only distributed but also highly hetero- 
geneous, makes it desirable to deploy multiple instances of different types of 
IDSes in order to achieve adequate protection of such networks. Last but not 
least such an ID architecture embodying multiple IDSes has to achieve adequate 
compliance with an organization’s security policy. 

The work described in the following is motivated by the fact that IDSes tend 
to generate large amounts of alarms (reports of suspicious activities) that need 
to be collected and interpreted. This is an issue because a substantial number 
of these alarms (up to 90% for some IDSes) may be false alarms, whereas these 
IDSes may still miss real attacks [12,13]. Our experience has shown that the 
processing of IDS alarms becomes even more challenging when considering a 
large-scale deployment of IDSes. 
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Like systems in general, IDSes can be evaluated in various ways, such as 
benchmarking or modeling. As we feel that benchmarking real IDSes [12,13] is 
not generic and systematic enough [14] for our evaluation needs, we investigated 
another approach, namely testing for IDS capabilities and not the implementa- 
tion of IDSes. One of the advantages of our approach is that we believe it enables 
us to evaluate a given IDS for its ability to detect a given attack even in the case 
where the corresponding attack signature has not yet been written for the IDS 
considered. Furthermore our approach is more generic and requires a relatively 
limited effort. 



1.1 Scope 

The long-term goal of this work is to provide a framework that allows the effi- 
cient operation of a large-scale ID architecture. This paper describes a first step 
towards evaluating IDSes in terms of their strengths and weaknesses. These eval- 
uation results will allow us to validate and to improve ID-architecture designs 
and to identify measures to process and interpret IDS alarms. 

Our approach describes IDSes and their environment by formalizing their 
characteristics. We do not try to describe the implementation of the ID algo- 
rithms used. Rather, our approach focuses on the description of attacks and 
activities in general. That is to say we are describing attacks in terms of the IDS 
characteristics required for their detection. 

In this first step we describe a Boolean-only approach, which is based on 
rules that express IDS characteristics required for the generation of an alarm. 
We claim that this approach enables us to analyze an IDS systematically by the 
output expected for a given input. The output we consider is the list of alarms 
and the set of diagnostic information (IP source address, user ID etc.) an IDS 
potentially generates for a given input. 

In order to limit the scope of this paper, we focus on network-based and 
knowledge-based IDSes as defined in [15]. Network-based IDSes are IDSes that 
monitor the traffic on a network, whereas knowledge-based IDSes monitor their 
information source for known suspicious activities [15]. Furthermore we restrict 
ourselves to a Boolean representation of IDS characteristics. The expansion to 
a non-Boolean notion of IDS characteristics and other types of IDSes is subject 
to further research. 



1.2 Outline 

We start this work by introducing an example in Section 2. In Section 3 we 
identify the IDS and environment characteristics relevant to describing the char- 
acteristics of IDSes. Based on these IDS characteristics we then discuss in Section 
4 how attacks and non-attacks can be described so they can be used to analyze 
an IDS. In Section 5 we propose a mechanism to evaluate IDSes, which uses 
IDS characteristics and activities identified in the preceding sections. Section 6 
concludes this work with a proposal of avenues for future work. 
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2 Simple Example 

Our approach describes attacks and non-attacks, i.e. activities in general, in 
terms of the IDS characteristics required for them to be reported by an IDS. 
Activities can be defined as a sequence of events that may lead to a system state 
transition.'^ 

As we need to formalize the description of IDS characteristics in order to 
describe activities, we propose that IDS characteristics be expressed by means 
of Boolean properties. These properties describe the various characteristics, ca- 
pabilities, configuration settings etc. that are inherent to a given IDS. 

In order to illustrate our notion of properties we introduce the following 
example of a well-known sendmail (a Unix mail system) vulnerability to which 
we are going to refer in further sections. An ancient version of sendmail allowed 
the unauthorized execution of arbitrary commands on the target host [16]. This 
was possible by supplying a UNIX command preceded by the pipe symbol “|” 
within the “from” field of a mail message sent over SMTP (simple mail transfer 
protocol). If at the same time an invalid destination address was supplied, which 
caused the message to bounce, sendmail executed the offending command while 
trying to deliver the bounced message to the mail folder of the sender (specified 
in the “from” field) of the message. Considering network-based IDSes, we can 
identify the following characteristics and capabilities required for detecting this 
attack on the network: 

— Pattern recognition algorithm - the IDS must be capable of recognizing the 
offending character sequence. 

— SMTP awareness - basic capability to treat SMTP traffic. We consider the 
awareness of a protocol to be an IDS’s capability to recognize a given pro- 
tocol based on protocol identifiers, layer 4 port numbers etc. This does not 
necessarily mean that the IDS is capable of verifying the correctness of the 
protocol sequence observed or to perform any further analysis of the protocol 
sequence. 

— TCP awareness - basic capability to treat TCP traffic. 

— IP awareness - basic capability to treat IP traffic. 

Furthermore, the following configuration characteristics appear to be re- 
quired: 

~ Known attack - the characteristics of the attack must be known to the IDS. 

— Enabled alarm - the reporting of this attack must be enabled in the IDS’s 
configuration. 

Having identified all these characteristics, the knowledgeable reader might 
argue that the use of such an IDS might result in false negatives. A false negative 
is a non-event that is an instance of failure of the IDS. Such a failure manifests 
itself in the fact that the alarm describing an attack launched against the system 

' Note that activities do not necessarily represent an attack. In fact, most activities 
observed are completely normal. 
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to be protected is not generated. The term failure has been defined by the 
dependability community [17] and represents in our case the fact that an IDS 
did not fulfill the requirements concerning the generation of alarms. The threat 
of false negatives becomes clearer when considering the following issues: 

— Attack variation - knowing the definitions of the IBS’s set of known attacks, 
the adversary may launch subterfuge attacks (slightly modified attacks [18]), 
which do not match any of the attack descriptions. An example of such 
a variation is IP fragmentation. Adversaries may send attack sequences in 
a fragmented IP PDU (protocol data unit). To detect such an attack the 
IDS must be able to reassemble IP fragments, which is a functionality not 
commonly implemented by IDSes. Further examples of attack variations, 
not related to the SMTP one, such as TCP stream slicing or hexadecimal 
encoding of URLs can be found when looking at tools such as whisker [19]. 

~ Overload situation - the machine monitoring the information source, e.g. the 
network, may be overwhelmed by the amount of information to be inspected. 
This may occur applications are draining resources from the machine hosting 
the IDS or if the network is heavily loaded. 

— Information loss - Information to be examined may be lost, e.g. a network 
interface may lose PDUs owing to misreception. If this occurs to the packet 
containing the offending sequence, the IDS will not recognize this attack. 

As mentioned in the introduction, IDSes may also fail by generating false 
alarms — also called false positives. In the context of this work a false positive can 
be defined as an alarm with an erroneous semantic.^ This issue can be illustrated 
by considering a variation of the SMTP example where the pipe symbol appears 
within the mail body instead of within the mail header. In this case the IDS can 
only function correctly if it is capable of performing 

— State-full protocol analysis - simply checking the TCP stream for the pipe 
symbol is not enough. The pipe symbol can only be considered an instance 
of an attack if it appears in the “from” field of the mail message. State-full 
protocol analysis requires the IDS to implement a finite state machine. 



IDS Failures 

This example has outlined a few causes for IDS failures. The operation of IDSes 
is rather costly in terms of resources. This often results in IDSes with limited 
functionality so they can cope with the amount of data they need to examine, 
e.g. TCP sessions may be analyzed on a per-packet basis instead of a stream 
basis [18,19]. In addition the information source used may limit the IBS’s view 
of an activity, e.g. the TCP source port number of a connection does not show 
up in a web-server log file [20] . 

^ Having defined false positives and false negatives, it may be worthwhile to mention 
that they may occur concurrently. This corresponds to a false recognition, i.e. the 
required alarm is missing and a false alarm is generated. 
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3 IDS Characteristics 

Looking at this discussion of IDS failures we realize that there may be a large 
number of properties that we need to identify and define in order to describe 
IDSes to an adequate level of detail. 

We can distinguish among the characteristics of an IDS according to two 
orthogonal classification schemes. The first scheme separates properties accord- 
ing to the level of detail of the characteristics they describe. The more generic 
group of properties can usually be derived from IDS taxonomies such as [15]. The 
more detailed group of properties describes characteristics related to a specific 
protocol, applications etc. 

The second classification scheme separates properties based on building 
blocks. We can distinguish among the IDS core, the information source, and 
the IDS configuration. As the IDS configuration usually does not influence IDS 
characteristics as they can be derived from IDS taxonomies, we were not able to 
identify properties representing generic configuration characteristics. 

We trust that these two classification schemes facilitate the definition of prop- 
erties by simplifying the identification of IDS and environment characteristics. 

3.1 Generic Properties 

Generic properties may be derived from an IDS taxonomy [15]. These properties 
are independent of specific protocols, applications etc. 



Information Sonrce 

The information source serves primarily to distinguish between host-based and 
network-based IDSes. Besides its type, further details are required in order to 
characterize the information source: 

— Type - One can distinguish between information sources that are based on 
the network, e.g. sniffers (network-based IDS), and those based on system 
and application logs, e.g. web server logs (host-based IDS). 

~ Information loss - Risk (approximated with high, medium and low) of miss- 
ing an information unit (e.g. PDU). 

— Information suppression - Possibility of an adversary to suppress information 
used by the IDS. 

— Information modification - Possibility of an adversary to modify information 
used by the IDS. 

— Information insertion - Possibility of an adversary to insert information used 
by the IDS. 

IDS Core 

Without going to detail one can identify the following sets of generic properties, 
which are not bound to any specific application or protocol: 
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— Context awareness - Context awareness is used to describe an IDS’ ability 
to analyze distributed actions, e.g. split routing [18], actions executed under 
differing user IDs on behalf of a single person etc. 

— Location awareness ~ The location of an IDS in the network is relevant be- 
cause it influences the set of activities that an IDS can observe. In addition it 
influences the value of activity attributes such as the source MAC address of 
an Ethernet frame. This may be used to determine whether the frame source 
is located on the local subnet, which may be of importance for recognizing 
spoofed IP PDUs. 

— Techniques available - Various techniques are required for the recognition of 
activities. Regular expression matching is required for string-based attacks; 
state machines are required for protocol analysis etc. 

— Delay - The delay introduced between the occurrence of an activity and 
the generation of an alarm can be approximated to high, medium and low, 
e.g. an IDS. 

— Load treatable - Depending on the techniques used by the IDS and the 
available resources, the network and machine load that can be sustained by 
an IDS varies and determines the risk of not recognizing an activity due to 
an overload situation [13]. 



3.2 Low-Level Properties 

Again considering our SMTP example it becomes clear that the generic proper- 
ties identified above are not fine-grained enough to represent an IDS. Its aware- 
ness of a given protocol or application may vary, which influences the quality of 
the output generated. 

Information Source 

The information source may be refined by a set of properties describing the 
activity attributes that the information source provides to the IDS, e.g. IP source 
address etc. 

IDS Core 

One can identify lower-level IDSM properties describing application, operating 
system or protocol-specific IDS characteristics. 



IDS Configuration 

Another set of properties may be used to specify the set of enabled alarms. A 
set of properties may be used to represent the set of attacks known to an IDS. 
These properties are important because an IDS may fulfill requirements such as 
TCP stream reassembly, but if it does not provide a description of how to detect 
a given attack, it may never generate the corresponding alarm. 
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4 Activities 

Based on the two property classification schemes just introduced we can charac- 
terize IDSes in suitable way to define activities proposed in the following. 

The definition of activities we are proposing is based on rules expressed by 
properties and other rules. So to speak, an activity is defined by a set of rules, 
whereas these rules describe the IDS characteristics required for the generation 
of a given alarm. 

An example (derived from the SMTP example introduced above) of such a 
rule describing the condition for the generation of the alarm alarm. SMTP. pipe 
in the case that an IDS observes the activity A . SMTP . pipe on the network could 
look as follows: 

A. SMTP. pipe->r. alarm. SMTP. pipe = p . inf oSrc . type .net & 
r .tech.patRec & r .prot . SMTP . aware & 
p . sign. alarm. SMTP .pipe & p . conf . alarm. SMTP .pipe 
The rule can be read as “the activity A. SMTP. pipe may cause an IDS to 
generate the alarm alarm. SMTP. pipe if the information source used is network- 
based and a pattern recognition algorithm is available etc.” To further elucidate 
the example, the semantics of the properties used can be described as follows: 

— p . inf oSrc .type .net - true if the information source is the network. 

— r .tech.patRec - true if any type of pattern recognition algorithm is pro- 
vided by the IDS. 

— r .prot . {SMTP I TCP I IP} . aware - true if the IDS has basic capabilities to 
treat the protocols listed. 

— p. sign. alarm. SMTP. pipe - true if the IDS is capable of generating the 
SMTP pipe alarm. 

— p . conf . alarm . SMTP . pipe - true if the SMTP pipe alarm is enabled. 

Note that to improve the readability, we suppress p . inf oSrc . type .net, 
p . sign . alarm . SMTP . pipe and p . conf . alarm . SMTP . pipe in the following dis- 
cussions. This is possible because they are not relevant to the principles to be 
introduced. 

4.1 Notation of Activities and Rule Groups 

Considering the example rule above, one notices that every term has a one- 
character prefix. The semantics of this notation is used to indicate the type and 
complexity of terms. Lower case characters indicate a simple property or rule. 
Upper case characters indicate complex constructs such as activities that may 
consist of several rules and properties. In the context of this work we use the 
following prefixes: 

p . * Previously defined properties. 

r . * Rules that may be composed of simple properties, rules or the some- 
what more complex rule groups. 
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G . * Rule groups are used as a writing convention and shall help us define 
activity variations. We use the rule groups to indicate that the rule 
within which they are used has to be expanded to a list of rules, 
i.e. they can be considered place holders. For each group member a 
list element, i.e. rule, is created, whereas the place holder is replaced 
with the corresponding group element. Using a C-like language, such 
a construct would be implemented as a loop over an array, whereas 
in prolog one would take advantage of the inference engine to expand 
the group. 

A . * Activities are a construct that may consist of several rules or lists of 
rules expressing conditions an IDS has to fulfill to be able to generate 
a given alarm. 

4.2 Activity Variations 

Adversaries often try to circumvent detection by slightly modifying their attacks. 
A typical example is IP fragmentation. One could envisage defining a separate 
activity for every variation that seems possible. However, this does not seem to be 
advisable because the number of activity variations may be very high — especially 
when considering the fact that an adversary may combine several variations at 
once. 

As mentioned above, activity variations are the main motivation for the intro- 
duction of rule groups. The use of rule groups can be demonstrated by developing 
the simplified SMTP pipe example activity introduced earlier: 

A. SMTP. pipe->r. alarm. SMTP, pipe = 

r .tech.patRec & r . prot . SMTP . aware 
In this example we define the two remaining terms as follows: 
r .tech.patRec^ = p.tech. stringMatch I p. tech. regexp 
r .prot . SMTP . aware = p .prot . SMTP . aware & G .prot .TCP . aware 
The first term was easily developed to simple properties representing an IBS’s 
capability of performing either simple string matching or more complex regu- 
lar expression matching. The second term has not yet been extended to simple 
properties. G . prot . TCP . aware stands for a group of the various degrees of TCP 
awareness. The resulting list represents the activity variations that may be cre- 
ated by an adversary by playing around with TCP-specific features. An example 
is the slicing of the TCP data stream into very small byte sequences, which may 
be used to circumvent detection by IDSes that do not fully reconstruct TCP 
streams [18,19]. One could argue that a property expressing the fact that an 
IDS is SMTP-aware implicitly also requires TCP awareness and that therefore 
the additional term G .prot .TCP . aware is not required. This not quite correct 
because the degree of TCP awareness influences the activity variations an IDS is 
able to cope with — which we try to represent with the term G . prot . TCP . aware . 



As this is an example, the list of the group members does not aim to be complete. 



3 
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The terms above can be further developed as follows: 

G .prot .TCP . aware = {r .prot .TCP . aware , 
r . prot . TCP . streamReassembly} 
r .prot . SMTP . aware /7 = { 

p .prot . SMTP . aware & r .prot . TCP . aware , 
p .prot . SMTP . aware & r .prot .TCP. streamReassembly 
} 

A. SMTP. pipe->r. alarm. SMTP, pipe /7 = { 

(p.tech.stringMatch I p. tech. regexp) & p . prot . SMTP . aware & 
r . prot . TCP . aware , 

(p.tech.stringMatch I p. tech. regexp) & p . prot . SMTP . aware & 
r . prot . TCP . streamReassembly 
} 

This expansion results in two variations of the SMTP pipe attack — one 
of them requiring the IDS to be capable of reassembling TCP streams. How- 
ever, the activity has yet not been developed into a property-only repre- 
sentation. The next step would now be to expand r .prot . TCP . aware and 
r .prot .TCP. streamReassembly according to their definition, which could be 
made as follows: 

r .prot .TCP . aware = p .prot .TCP . aware & G .prot . IP . aware 
r .prot .TCP . streamReassembly = p. prot .TCP. streamReassembly & 
r .prot .TCP . aware & r .prot . IP . f ragmentReassembly^ & 
r . tech . statef nil® 

G .prot . IP . aware = { 

r . prot . IP . aware , r . prot . IP . f ragmentReassembly 

} 

We are not going to exercise the further expansion here because this can 
be done in a similar way as the previous expansion steps. However, one 
might want to note that further activity variations are created by the term 
G . prot . IP . aware . 

4.3 Activity Groups 

By introducing the notion of activity groups (AG . * prefix) whose member activ- 
ities are similar in the sense that an IDS may confuse them, we hope to facilitate 
the finding and definition of activities. Once activity groups have been defined, 
IDSes can be evaluated on a per-activity group basis, which should result in a 
simplified evaluation procedure. 

r .prot . IP . f ragmentReassembly stands for the IBS’s capability to reassemble frag- 
mented IP traffic. 

® r . tech, statefull represents the group of techniques that allow an IDS to perform 
a state-full analysis of the activities observed. Those techniques are typically finite 
state machines, Petri nets, etc. 
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The systematic identification of suspicious activities that an IDS should re- 
port is subject to ongoing research and is outlined in the outlook section (Sec- 
tion 6). However, for the following we assume that a suspicious activity that may 
potentially cause an IDS to generate alarms has been identified. 

Along with the definition of the activity describing an attack, we propose a 
search for non-attack activities based on common knowledge about IDS failures 
as outlined in Section 2. 

In order to facilitate the definition of activity group members, i.e. activities, 
we introduce a rule describing the characteristics commonly required from an 
IDS so it is able to cope with this group of activities. Reconsidering the SMTP 
pipe example we could define the activity group and the activity as follows: 

AG . SMTP ,pipe->r . common = r .tech.patRec & r .prot . SMTP . aware 
A. SMTP. pipe->r. alarm. SMTP, pipe = AG. SMTP. pipe->r. common 
The group AG. SMTP. pipe can now be used to define further activities such as 
the non-malicious activity where a message containing the pipe symbol within 
the message body is transferred over SMTP. Although this is not an attack, an 
IDS may generate an alarm (false positive) if it is performing pattern matching 
on the SMTP data stream only. In order to recognize the situation correctly, 
the IDS needs the ability to analyze SMTP at the protocol level, i.e. the IDS 
must be able to identify the beginning and the end of the message body. The 
corresponding activity A . SMTP .pipe .body could be defined as follows: 

A . SMTP . pipe . body->r . alarm . SMTP . pipe = 

AG. SMTP. pipe->r. common & 

! r .prot .SMTP. cmdAware^ 
r . prot . SMTP . cmdAware = p . prot . SMTP . cmdAware 
& r .tech. statefull & r . prot . SMTP . aware 
Having identified such an activity group one can evaluate the behavior of an 
IDS with respect to this group of activities. We expect the evaluation to provide 
information about the activity variations that are detected or ignored correctly 
along with the false positives and the false negatives that a given IDS might 
generate. 

It is further noteworthy that we expect the definition of activities to be 
reasonably scalable. We believe that the rules that have been identified once can 
be reused within the definition of other activities, e.g. once we have specified the 
rule that represents the IDS capabilities required for treating TCP traffic, this 
rule can be reused when specifying SMTP activities, HTTP activities etc. 

5 IDS Evaluation 

Having identified all the IDS characteristics and activity definitions we propose a 
simple model that allows us to evaluate a given IDS based on its characteristics. 

The IDS evaluation model as proposed in Figure 1 can be used to evaluate 
IDSes systematically with respect to a given list of activities, i.e. activity groups. 

® r . prot . SMTP . cmdAware represents the IBS’s capability to recognize SMTP com- 
mands and to treat the corresponding data in the appropriate context. 
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Fig. 1. Evaluation of an IDS for a given activity 



Once an activity has been analyzed with respect to the IDS characteristics one 
obtains a list of activity variations associated with values that indicate whether 
an alarm would have been generated (IDS output). In order to know which of 
those results represent correct detection, correct non-detection, false positives 
or false negatives, one has to cross-validate (result evaluator) these results. This 
cross-validation is done with the knowledge of whether the activity represents 
an attack and, if so, which (correctAlarms) alarms one would expect the IDS 
to generate (IDS evaluation results). 

5.1 Finding Strengths and Weaknesses 

Let us assume one has characterized a number of IDSes and defined a set of 
activity groups that shall be used to evaluate the IDSes. If we evaluate every 
IDS for every activity defined, we should obtain a clear picture about the types of 
activities a given IDS masters well and the types of activities that cause an IDS 
to fail. Also we hope to be able to compare IDSes based on the number and types 
of failures one can expect on a per-activity basis. We hope that this knowledge 
will enable us to analyze, evaluate and improve more complex ID-architectures 
consisting of several diverse IDSes. 
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6 Outlook 

The long-term goal of collecting knowledge about IDSes’ strengths and weak- 
nesses is to combine several IDSes in a manner that shall allow us to reduce 
of the failure rate of the ID-architecture as a whole. Furthermore we hope this 
knowledge will enable such an IDS-architecture to present a condensed view 
of security-related activities to a security officer or network administrator. We 
hope that this structured, per-activity variation information has adequate level 
of detail that we can derive good rules and mechanisms to combine the output 
of different IDSes to reach the goals just mentioned. Also we hope that this in- 
formation helps us decide how to distribute different types of IDSes within the 
network. 

However, before being able to address these issues, we first have to validate 
the proposed approach, which we will do by means of an implementation using 
a rule-based language such as prolog. Also the approach itself may need to be 
extended as we have not yet sufficiently taken the influence of the environment 
into account and are not providing diagnostic information about the activity 
observed by the IDS and the IDS itself. The load on the network, for instance, 
may greatly influence the failure rate of an IDS. Finally we have investigated 
whether a simple way exists to extend this proposal to behavior-based IDSes 
and host-based IDSes. 

Also the approach concerning how activity groups can be identified is a sub- 
ject of ongoing research. The most straightforward way to identify activity groups 
is probably to take a list of advisories (CERT etc.) and start defining activities 
that cover the vulnerabilities they describe. Whereas this seems to be a valid 
approach [21], it is not particularly systematic. We are considering exploring a 
more formal approach based on so-called fault assumptions [22] as used in the 
dependability research field [17]. In a nutshell the idea is first to identify a list of 
systems and subsystems (network elements, services, applications, middleware, 
files etc.). Then, considering systems as objects, we identify a list of high-level 
operations (methods) that can be executed in the context of a system, e.g. read- 
ing an object attribute etc. We hope to find a common set of methods that could 
then be extended with system-specific operations as required. Finally, we plan 
to consult a classification of vulnerabilities [4,23] and to extract a list of attack 
types (buffer overflow, meta-character tricks, privilege abuse, spoofing etc.). We 
then try to identify attack scenarios that apply for a given method defined for 
a given object, which we hope will allow us to find an adequate set of activity 
groups that can be used to evaluate IDSes. 

7 Conclusion 

In this paper we described an approach that will hopefully allow us to obtain a 
unified, in-depth understanding of the strengths and weaknesses of a given IDS. 
The main objective we addressed is the description of activities by the combi- 
nation of IDS characteristics required for generating the corresponding alarms. 
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Furthermore the approach was illustrated by the definition of an activity that 
represents an SMTP attack, along with a nonmalicious activity that potentially 
may cause an IDS to generate a false alarm for the very same attack. 

In addition we described the scope within which this work was conducted 
and outlined our future plans. 
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Abstract. This article presents an attack description language. This 
language is based on logic and uses a declarative approach. In the lan- 
guage, the conditions and effects of an attack are described with logical 
formulas related to the state of the target computer system. The vari- 
ous steps of the attack process are associated to events, which may be 
combined using specific algebraic operators. These elements provide a 
description of the attack from the point of view of the attacker. They 
are complemented with additional elements corresponding to the point of 
view of intrusion detection systems and audit programs. These detection 
and verification aspects provide the language user with means to tailor 
the description of the attack to the needs of a specific intrusion detection 
system or a specific environment. 



1 Introduction 

In this article, we study the definition of an attack description language which 
could be used in a diagnosis program to model the various alerts raised by one or 
several Intrusion Detection Systems (IDS), and to reason about the behaviour of 
a potential computer system intruder. This topic is related to several indepen- 
dent previous work. For example, CISL [1] is a language that aims at representing 
specific instances of an intruder attack. Similarly, work currently in progress at 
IETF [2,3,4] proposes a language to describe the alerts raised by different IDS in 
a common framework. These languages focus on the description of specific oc- 
currences of some type of attack or alert. They also address data communication 
issues, and offer a common basis to exchange information between, for example, 
various intrusion detection systems and system administration consoles. 

In our work, we focus on the problem of describing attacks themselves. The 
language we present allows us to define a generic description of an attack opera- 
tion, independently of a specific intrusion detection process in a specific computer 
system. The generic description is then complemented by additional elements re- 
lated to the intrusion detection operation, and verification of the feasibility of 
the attack in the actual computer system target of the attack. This should allow 
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us to take into account the specificities of a particular computer system and of 
the IDS used. 

The work performed on CISL and at IETF offers the opportunity to exchange 
information concerning specific attacks or alerts raised by IDS. One objective of 
our approach is to define a language in a syntactic framework compatible with 
the common framework under development at IETF. Furthermore, the language 
should provide means that could be used by a reasoning program. This program 
could provide a detailed and sensible diagnosis concerning a (potential) intrusion 
occurring in the computer system under analysis. 

The structure of this article is the following : Section 2 presents the context 
of our study and various definitions. Section 3 details the requirements we adopt 
for an attack description language. Section 4 defines the attack description lan- 
guage proposed in this paper. Section 5 presents an example of a multiple steps 
attack described with this language. Section 6 describes some of the possible 
applications of the attack formalisation. Finally, Section 7 concludes. 

2 Context and Definitions 

Figure 1 presents a high level overview of the various elements appearing in the 
intrusion detection framework and the way they interact with each other. 




Fig. 1. Conceptual model 



In the remaining of this section, we present in more detail some of the com- 
ponents of Fig. 1, with respect to the attack process in the first part, and then 
with respect to the intrusion detection process in the second part. 

2.1 Attack Process 

Overview. In the language, we essentially describe an attack as a combination 
of actions, complemented by several statements in relation with the target com- 
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puter system. (We add IDS-related issues to this description in a later step.) 
The core components of an attack description are : 

~ A set of conditions to be satisfied in the system target of the attack for this 
attack to succeed, or to be satisfied by the attacker (for instance, specific 
access rights the attacker needs to perform the attack). 

— The effects of a successful attack are the consequences of its performance in 
the system. Such effects can be associated with the occurrence of a damage 
in the system (e.g. data destruction, service disruption) or a gain for the 
attacker (for instance the acquisition of knowledge concerning the target 
system) . 

— A scenario describes how an attacker combines different actions in order to 
perform the attack described. These actions are associated to the different 
attack steps. One attack step can be an elementary operation performed by 
the attacker, either in the target computer system or in the systems under 
the attacker control. Such step can also correspond to the execution of other 
lower-level attacks. 

With these elements, we address the description of the attack from the point 
of view of the attacker. We describe the attack scenario the attacker should 
perform against a computer system satisfying the conditions of the attack when 
his intention is to bring about the effects of the attack. 



Malicious and Suspicious Actions. We introduce a distinction between two 
kind of attacks : malicious actions and suspicious actions. 

We define malicious actions with respect to the security policy of the sys- 
tem. Malicious actions are attacks whose effects lead to direct violation of the 
security policy of the computer system target of the attack. Such attacks are an 
immediate security concern when successful. 

Suspicious actions are defined with respect to malicious actions. We say an 
action is suspicious if such action may be used as a step of the scenario of a 
malicious action. Therefore, an action is suspicious when it can contribute to 
the execution of a malicious action.^ 



2.2 Detection Process 

Intrusion Detection Issues. In order to account for the attack from the point 
of view of an intrusion detection system, we describe separately what should be 
done to detect the occurrence of an attack. Therefore, we describe additionally : 

^ In this case, it may be inappropriate to designate the suspicious actions with the 
term “attack” - this word usually implies a malicious intention. In this paper, we use 
such a terminology as it seems to us that the word “attack” is used commonly in the 
computer security held to designate the special combination of both malicious and 
suspicious actions we want to designate, and does not necessarily imply a relation 
with the security objectives of a specihc environment. 
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— The detection process, i.e. the elementary actions that should be performed 
in order to detect an attack. Several operations may be needed to complete 
the detection. These detection operations are closely related to the actions 
performed by the attacker, but they are clearly not identical. For example, 
some of the attack actions may be performed on the attacker local computer 
and are not observable by the IDS of the target computer system. In this 
case, the action associated with the detection of such a non observable attack 
would be the empty action (denoted noop in the following). 

— We also describe how these detection actions are combined. Such description 
is similar to the description of the attack scenario, but the action types 
involved are distinct : only detection actions are involved in the detection 
scenario, while only attack actions are involved in the attack scenario. 

— Finally, we want to complement detection actions with verification actions. 
These actions aim at evaluating the impact of an attack on the computer 
system. For instance, system checks may indicate if an attack was successful 
or not. Similarly, audit scripts may indicate if the system exhibits a known 
vulnerability. 

For convenience, we use the same word ’’action” for describing detection, 
verification and attack actions in this text. However, notice that, even though 
we represent them in the same way in the formalism, there are clearly three 
distinct types of actions involved. 



IDS Signatures. In order to identify an attack, scenario-based IDS often rely 
on attack signatures. Such signatures are related to detection actions, but often 
they correspond to more pragmatic representations guided by operational con- 
cerns (e.g. performance). Attack signatures incorporate information related to 
all or a subset of the detection actions. Signatures should be directly usable by 
the IDS to examine the events monitored in the computer system. 

Further in the intrusion detection process, the detection of such a signature 
in the flow of events monitored by the IDS lead to the generation of an alert. 

An additional objective of our attack description language is to establish a 
link between the abstract representation of a detection scenario included in the 
attack process, and the signatures used in practice by IDS to recognise attacks. 
However, it is not the purpose of this paper to develop a language to represent 
attack signatures. 

Notice that the attack signature is one attribute of the detected_by rela- 
tionship between the Attack and IDS entities (see Fig. 1). Other attributes may 
be reliability and completeness. These attributes are used respectively to rep- 
resent how the IDS is reliable (resp. complete) with respect to the detection of 
a given attack. 



Alerts. The terms alarm and alert are often used to refer to the informative 
actions performed by IDS when they identify attack actions regarding the system 
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they monitor. In this article, we consider that both terms are equivalent. We will 
only use the term alert in the rest of this paper. 

Ideally, an alert should be raised by the IDS only when it detects the execu- 
tion, or the execution attempt, of a malicious action. Indeed, either successful or 
unsuccessful, an intention to violate the system security policy requires the IDS 
to actively request the attention of the security administrator (who is responsi- 
ble for the enforcement of the security policy and possibly specified this policy 
in the first place). 

In practice, suspicious actions frequently lead IDS to raise alerts, even though 
the actions detected do not relate directly to the security policy. Of course, 
suspicious actions may be related to potential violations of the security policy. 
As such they clearly require attention from the part of the IDS. For example, the 
suspicious nature of an action is clearly a logging criteria. We think it should be 
up to the security administrator to decide if he wants to receive an alert when 
such attacks are detected. 

Providing a common framework for the description of alerts is currently one 
of the objectives of the IETF [2,3]. As mentionned in the introduction, our goal 
in this paper is different but we aim at defining a language that is compatible 
with the IETF framework. 

3 Requirements 

3.1 Expressive Power 

First, the language we study should allow us to describe attacks in conformance 
with the different aspects identified in Section 2. 

Furthermore, we follow a declarative approach for the language definition. A 
requirement is to provide a language that allows the definition of the different 
components of an attack description in a declarative manner. Therefore, we see 
that : 

Pre-condition and post-condition. The conditions that should be satisfied 
by the computer system for an attack to be feasible are associated to a pre- 
condition of the attack. The effects of the successful execution of this attack 
are associated to a post- condition. These pre-condition and post-condition 
are described by logical conditions. These logical conditions deal with the 
states of the computer systems corresponding to the potential targets of the 
attack (the computer system to protect) and with a representation of the 
attacker (its knowledge or its rights). Notice that we can assume that a given 
attack may have one or several potential targets. 

Scenario. The scenario of the attack from the point of view of the attacker is 
described as a combination of events as well as a description of the various 
events involved in this scenario. Specifically, the action associated to each 
event of the scenario should be identified. 

Detection. The actions to perform in order to detect the attack are described in 
a similar language, but while the attack scenario contains the actions of the 
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attacker, the detection scenario only contains detection actions. According 
to the various attacks we studied, the two scenarios frequently differ. As an 
example, this is the case when some of the attacker actions are not observable 
by the IDS. 

Verification. Generally, the effects of an attack are observable in the computer 
system. Similarly, it is possible to use audit programs [5,6] to test the exis- 
tence of vulnerabilities in the system. We would like to include events associ- 
ated to such verification in the attack description. For example, such events 
correspond to a system failure detection procedure or a specific vulnerability 
test program. 

The scenario-related, detection-related and verification-related parts of one 
attack description correspond to distinct parts of the description. Most notice- 
ably, we do not intend to deduce automatically the detection or verification 
actions from the attack scenario. Similarly, the pre-condition and post-condition 
clauses do not directly correspond to some (logic-based) description of the attack 
scenario. 

With our approach, the occurence of attack actions is deduced from detection 
actions. Similarly, the truth value of the pre-condition and post-condition is 
derived from the verification actions. Furthermore, the pre-condition and post- 
condition induce constraints on the description of a high-level attack as the 
combination of several low-level attacks. 

3.2 Modularity 

We require the language to offer the opportunity to describe an attack scenario 
using actions corresponding to other, different, lower-level attacks. This modu- 
larity requirement correspond to the need to describe high-level attacks using 
previously defined attacks. Additionally, a high-level attack may also involve 
several steps related to practical basic operations. 

Indeed, in practice, an attacker is often required to perform several successive 
attacks in order to cause real damage to a computer system, or obtain a signif- 
icant profit. The description language should be modular to offer some way to 
combine attacks as well as basic operations in order to describe such a high-level 
scenario. 

3.3 Deduction 

Finally, as our language is based on a logical representation, it should offer some 
deductive capabilities. More precisely, deduction procedures may be helpful for 
several reasons : 

— Deductive reasoning offer the opportunity to help the user to express the 
pre-condition and post-condition of a high-level scenario, using the pre- 
conditions and post-conditions of the lower level attacks involved in this 



scenario. 
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— If two different attack scenarios are detected by one or several IDS, logical 
reasoning may allow us to take into account directly the fact that these com- 
bined tools have also detected a high-level scenario. This high-level scenario 
composed of the combination of two scenarios may correspond to another 
attack (or to some steps of it), that may be taken care of automatically (see 
also [7] for a similar idea). Such automatic deduction could complement the 
atomic detection capabilities of the intrusion detection tools used. The at- 
tack description language should allow us to perform such deductions with 
respect to stand-alone attacks as well as complex attacks. However, this rea- 
soning may necessitate a new additional component in the IDS to manage 
the information delivered by the detection tools. (The software architecture 
of such a deduction module is not further studied in this article.) 

4 Language 

4.1 System Model 

Our system model is presented in Fig. 2. The information associated to system 
states is represented in first order logic using logical predicates. The informa- 
tion associated to system transitions is modelled using events, according to the 
approach presented in [8]. 



state 1 



state 2 





Li — State Description. State descriptions correspond to the definition of the 
pre-condition and post-condition of an attack. We use a language, denoted Li, 
which is simply the logic of predicates. 

Predicates are used to describe properties of the state relevant to the descrip- 
tion of an attack. For example, for network-related attacks, interesting predica- 
tes may be : port{tdnet, 23, tcp), local jiccess{S, U) or actweservice{S, telnetd) 
(where S and U are logical variables denoting respectively a computer system 
and a user). 

These predicates are combined using the usual logical connectives -i, V, A to 
build the pre-condition and post-condition denoting the conditions and effects 
of an attack on the system state. For example, a pre-condition of the form 
activeservice{s, telnetd) A port {telnet, 23, tcp) denotes the fact that the telnet 
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network service should be available in the target system on the standard TCP 
port for the attack to be performed successfully. 

Sometimes, the effect of an attack is simply a knowledge gain for the attacker 
about the target system. In order to represent a knowledge gain, we also assume 
that language L\ includes a meta-predicate knows. For instance, if A is the 
attacker, then knows{A,activeservice{S,telnetd)) means that A knows that 
telnet is an active service of system S. 

L-2 — Transition Description. In our system model, we associate transitions 
to the occurrence of events, and we provide a language for combining these events 
in ways similar to event calculus. 

We consider that events are objects [8]. These objects are collections of at- 
tributes. Events are defined in a language, denoted L2, based on the logical 
operators ^ and A plus the equality operator = and a set A of attribute names, 
with A = {attributei,attribute2, •.•}• If e is an event, attributei{e) = v denotes 
the fact that the value of attribute attributei of e is v. 

In the following, we will commonly use the set A = {action, actor, date} to 
define the possible attributes of an event. 

Attribute action is associated to an attack action (e.g. command finger). 
Attribute actor is associated to a set {si, ..., s„} of users performing the action. 
When a single user s is involved, we will follow the convention that s = {s} for 
convenience. Attribute date corresponds to a time interval [^1,^2] associated to 
event e. Similarly, we write t instead of [t,t] for convenience. We assume that 
language L2 includes the predicates < and < to perform comparisons between 
times. 

In summary, the attributes of a specific event e can be expressed in L2 with : 
action{e) = a A actor{e) = u A date{e) = [^1,^2] 

is — Combining Events. Then, the events calculus algebra that may be used 
to combine several events is introduced via a third language, denoted is. In is, 
we provide the following operators to combine two events ei and 62 : 

1. 6i ; 62 : designates the sequential composition of events ei then 62 ; 

2. 6i I 62 : designates the parallel unconstrained execution of 61 and 62 ; 

3. 6i : indicates the absence of 61 in the event flow; 

4. 61 ? 62 : represents the non deterministic choice between 6i and 62 ; 

5. 61 & 62 : designates the synchronised execution of both events ci and 62 

(order is not significant) - we have : 61 & 62 = (ci ; 62) ? (62 ; 61) ? (ei | 62). 

Finally, it is possible to include optional events in the attack specification. We 
note [e] the fact that e is an optional event. To define [e], hrst we define the event 
constant noevent, corresponding to a null event, such that actioninoevent) = 
noop , where noop is a no-operation action. Then, we state : [e] = e ? noevent. 
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4.2 Full Attack Description 

We use the following notation to define all the elements of an attack description : 

attack attack jname{arg I , arg2, •■•) 
pre : cond S Li 
post : cond G L\ 
scenario : expr G L3 
where cond G L2 
detection : expr G L3 
where cond G L2 
verification : expr G L3 
where cond G L2 

By convention, inside the description of an attack, names spelled in lower 
case or numbers designate constants (e.g. 21, fingerd). Names starting with a 
capital letter designate variables (e.g. U, Host, Ei). Variables are local variables : 
their scope is limited to the attack description where they appear. Variables 
declarations are omitted. 

Attack names may be used as event actions. Therefore, the action attribute of 
one event e appearing in the scenario of an attack named attacki may correspond 
to another lower-level attack attack2. For instance, we indicate this by stating 
action(e) = attack2{...) in the where clause of the definition of the scenario of 
attacki. 

The conditions appearing in where clauses of an attack description are used 
to formulate constraints between the various attributes of the events of the de- 
scription and the variables used in the pre-condition or post-condition. For ex- 
ample, if e is an event of the scenario of the attack, e' a detection event, and 
e" a verification event, we may express the following constraints between the 
attributes of e, e' and e": 

scenzirio : e 

where action(e) = sniffer(TargetHost,FromHost) 

A actor(e) = U.ser A date(e) = [^1,^2] 

detection : e' 

where action(e') = detect sni f fer() 

A actor(e') = FromHost A date(e') = [^1,^2] 

verification : e" 

where action(e") = f ailed Jogin(TargetH ost) 

A actor(e”) = User A date(e") = t^ A t2 < h 

The language L2 used to describe the attributes of events aims at describing 
these events as data objects. The alert and attack description language cur- 
rently under development at IETF shows similar objectives [3,4]. Furthermore, 
the effort of IETF should provide a more detailed and standard language for 
describing the various attributes associated to occurrences of attacks and alerts. 



206 



Frederic Cuppens and Rodolphe Ortalo 



Such a language could be used to extend L2 in the future to describe with more 
details the components of the events appearing in the attack description. 

The detection-specific and verification-specific parts of the attack description 
(indicated by keywords detection and verification) reveal some redundancy in 
the description. The detection clause describing the actions an IDS should per- 
form to detect the attack execution is linked to the scenario part of the attack 
description (i.e. the view of a potential attacker). Similarly, the verification 
clause is used to check the truth value of a subset of the information provided 
by the pre and post clauses. These pre-condition and post-condition model the 
required opportunities and possible effects of the attack execution respectively. 
A system verification tool should execute audit actions corresponding to these 
formulas to check if a specific computer system is vulnerable to such an attack or 
if a (previously detected) occurrence of this attack has succeeded. Therefore, for 
a given system, actions mentioned in the verification clause should correspond 
to pre-condition or post-condition checking test. 

As mentioned previously, these additional components of the attack descrip- 
tion exist for specific reasons. First, they identify explicitly the aspects related 
either to an IDS, or to an audit tool that could be used to assess the feasibility 
of an attack in a specific computer system, or the impact of a detected attack. 
Furthermore, these components identify actions that should be performed either 
to detect the occurrence of such attack or to analyse the existing vulnerabilities 
of a computer system. They correspond to an operational view, with respect to 
a security audit or to intrusion detection. 

5 Example 

As an example, we present the description of a simple attack that can be per- 
formed against a computer system whose security configuration is too permissive. 
This example illustrates the use of the language and illustrates the potential dif- 
ferences between the various parts of an attack description. 

5.1 Overview 

The attack we want to describe involves several steps corresponding to the fol- 
lowing commands : 

1. rpcinfo -p Target-IP 

2. showmount -e Target-IP 

3. showmount -a Target-IP 

4. finger @Target-IP 

5. adduser — uid Userid Username^ 

6. mount -t Target-partition /mnt 

^ Alternatively, vi /etc/passwd and editing actions may be used instead of the com- 
mon adduser script. 
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In this attack, steps 1, 2, 3 and 4 correspond to knowledge acquisition: the 
attacker obtain information concerning the various hard disk partitions exported 
by the target computer system using the NFS protocol. Step 5 is a local action 
from the point of view of the attacker : he creates a new user account on his 
own computer with specific parameters matching those identified in the pre- 
vious steps. Finally, in step 6 the attacker mounts the target partition on his 
computer and obtains access to its content. This last step will succeed only if 
the target computer configuration is somehow permissive (some method of host 
authentication could prevent such an attack). 

5.2 Partial High-Level Description 

In an initial phase, the attack is modelled by combining the various steps identi- 
fied previously. Attack events, noted Ai, A 2 , Aq are introduced for each step of 
the attack. The actions associated to these events are enumerated. These actions 
correspond to lower-level attacks, which are modelled in the next section. 

attack N F S jabuse{Target-IP) 

pre : 
post : 

scenario : ((Ai ; {A 2 & A3)) & A4 & A5) ; Ag 
where action{Ai) = rpcinf o{Taiget-IP) 

A action{A 2 ) = showmount_e{Target-lP) 

A action{A^) = showmount_a{Target-IP) 

A action{A 4 ) = finger {Target-IP) 

A action{Af) = create -account {Username, Userid) 

A action(Ag) = mount (/mnt) 
detection : 
where 

verification : 
where 

At this step, the description is still incomplete. Most notably, the precise 
pre-condition and post-condition of the attack are not easy to determine di- 
rectly given that the attack involve several steps. We show in the following 
how these elements are related to the pre-condition and post-condition of lower- 
level attacks. Such logical relations allow us to complete the description later in 
Sect. 5.4. 

Similarly, the detection and verification clauses are also left unspecified 
in this initial sketch of the attack description. 

5.3 Elementary Steps 

In this section, we describe the basic steps of the complex attack sketched pre- 
viously. In these descriptions, several kind of events are used : attack events 
are noted Ei, detection events are noted Fi and system verification events are 
noted Gi- 
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Step 1 . 

attack rpcinfo{ Target-IP) 

Cl C2 

/ 1 ^ 1 

pre : remote-access(A, H) A ipjxddress{H, Target-IP) 

A useservice{H , portmapper) A useservice{H , mountd) 

' V ' ' V ' 

C3 C4 

post : knows{A,Cz) A knows{A,C4) 

Pi P2 

scenario : Ei 

where action(Ei) = rpcinfo -p Target-IP 
A actor (El) = A 

detection : Fi 

where action(Fi) = detect(Ei) 

verification : Gi 

where action(G\) = test_service{portmapper) 

Pre-conditions and post-conditions of the attack are expressed using : 

— C\ : remote-access{A, H) which means that attacker A has a remote network 
access to the target host H. 

— C2 ■ ip-address{H, Target-IP) which means that the IP address of host H is 
Target-IP. 

— C3 : use-service{H,portmapper), the portmapper network service is active 
on host H. 

— C4 : use-service{H, mountd)^ the NFS service (daemon program mountd) is 
active on host H. 

— Pi : knows{A,useservice{H, portmapper)) which means that the attacker 
learns that host H uses the network service portmapper. 

— P2 : knows{A, useservice{H , mountd)) the attacker learns that host H uses 
an NFS daemon. 

Step 2 . 

attack showmount_e{Target-IP) 

Cb 

, 1 

pre : Ci A C2 A C4 A exported jpartition{H , P) 

post : knows{A,C5) 

P3 

scenario : E2 

where action{E2) = showmount -e Target-IP 
A actor {E2) = A 

detection : F2 

where action{F2) = detect{E2) 

verification : G2 

where action(G2) = test_service{mountd) 

' V ' 

Vi 
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— C5 means that host H exports the hard disk partition P via NFS. 

— P3 means that the attacker A knows that C5 is true. 



Step 3. 

attack showmount_a{Target-IP) 

Ce 

, ^ s 

pre : Ci A C2 A C4 A mountedjpartition{H , P) 
post : knows{A, Cq) 

^ ^ > 

Pi 

scenario : E3 

where action^E^) = showmount -a Target-IP 
A actor{E3) = A 

detection : F3 

where action{F3) = detect{E3) 

verification : G3 

where action{G3) = Vi 

— Cq means that the partition P is also a partition locally mounted by host H. 

— P4 knows that the attacker A knows that Cq. 



Step 4 . 

attack finger {Target-IP) 

C7 c% 

/ ^ s / ^ 

pre: C1AG2A connected-User{U, H) A userid{U, H, Userid) 

A useservice{H, fingerd) 

' V ' 

Cg 

post : knows{A, Cr) A knows(A, Cg) 

^ ^ ^ ^ > 

P5 Pe 

scenario : E4 

where action{E4) = finger @Target-IP 
A actor{E4) = A 

detection : F4 

where action{F4) = detect{E4) 

verification : G4 

where action{G4) = test ^ervice{ fingerd) 

— Cr means that user U is currently connected to host H. 

— Cg means that the user ID associated to the target user name U is Userid 
on host H. 

— Gg means that host H provides the finger service. 

— Pg means that the attacker A knows that Gy. 

— Pq means that the attacker A knows that Cg. 
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Step 5. 

attack cr eate -account {U , Userid) 

ClQ 

, ^ 

pre : rootjuser{A, Ha) 

post : userid{U, Ha, Userid) A knows{A, Pq) 

' V ' '■ V " 

Pa Pi 

scenario : 

where action(E^) = adduser — uid Userid U 
A actor {E 5 ) = A 
detection : noevent 
where True 
verification : noevent 
where True 

— Cio means that the attacker A is a super-user on the attack host Ha- 

— Pq means the user U is now a user of the attack host Ha with a user ID 
equal to U serid. 

— P’j means the attacker A knows that Pg is now true. 



Step 6. 

attack mount(Mount-point) 

pre : C\ A C 2 A A Cq A Cg A Pg A 

connected-User{A, H a) A owner {Directory, U) 

" V ' ' V ' 

Cii C12 

post : can_access{A, Directory) 

Pi 

scenario : Pg 

where act*on(Pg) = mount -t nfs P Mount-point 
A actor (E q) = A 

detection : Pg 

where action{F^) = detect{Eo) 

verification : Gg 

where action{G^) = Vi 

— Gii means that A is connected to the attack host Ha- 

— Pg was described at step 5. 

— Gi 2 means that the user U is the owner of some Directory, contained in the 
exported partition P. 

— Pg means that the attacker A can now access the Directory of U - 



5.4 Full Description 

Thanks to the description of the basic attacks presented in the previous section, 
we can now complement the partial description given in Sect. 5.2. 
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More precisely, we see that the pre-condition and post-condition mentioned 
in the descriptions of rpcinfo, showmount-e, showmount_a, finger, mount 
and createjCLCCount can be used to propose automatically to the user valid pre- 
condition and post-condition for attack NFSjxbuse. 

This process can be repeated to identify precisely the events appear- 
ing in the scenario of the attack. This leads to a detailed description of 
the scenario clause (compared to the equivalent one shown in Sect. 5.2). 
Taking into account the description of the low-level attacks, the high-level 
events A\, A2, Aq used in Sect. 5.2 are mapped to the actual commands cor- 
responding to events E\, E2, ■■■, Eq. Detection and verification events (Fi, ..., F 5 
and Gi, ..., G 5 ) omitted in the partial description of Sect. 5.2 are also extracted 
from the low-level attacks. 

attack N E S jabuse{Target-IP) 

pre : GiAG2AG3AG4AG5AG6AG7AG8AG9AGioAGhAGi2AF(3 

post : Pg 

scenario : ((Fi ; {E2 & F3)) & F4 & F5) ; Eq 
where action{Ei) = rpcinfo -p Taiget-IP 
A action{E2) = showmount -e Target-IP 
A action{E^) = showmount -a Target-IP 
A action{Ei) = finger @Target-IP 
A action(E^) = adduser — uid Userid U 
A action^Ee) = mount -t nfs P \mnt 
A actor (El) = A A actor {E2) = A 
A actor{E^) = A f\ actor{E4) = A 
A actor{E^) = A f\ actor(EQ) = A 
detection : ((Pi ; (P2 & P3)) & P4) ; noevent ; P5 
where action{Fi) = detect(Ei) 

A action{F2) = detect{E2) 

A action{F^) = detect{E^) 

A action{F4) = detect{E/f) 

A action{F^) = detect{Eo) 
verification : ((Gi ; (G 2 & G 3 )) & G 4 ) ; noevent ; G 5 
where action(Gi) = test^service{portmapper) 

A action{G2) = test_service{mountd) 

A action{G^) = test_service{mountd) 

A action{G4) = test_service{fingerd) 

A action{Gf) = test_service{mountd) 

It is possible to simplify the description produced by a systematic examina- 
tion of the low-level steps of the attack. Events noevent may be removed from 
the detection or verification clauses. We can note also that Pq, which ap- 
pear in the pre-condition of N F S -abuse due to the fact that it appears in the 
pre-condition of the low-level attack mount{) (step 6 ), appears also in the post- 
condition of create-accountf) . (In fact, step 5 exists for this purpose.) There- 
fore, Pq is obtained during the attack process and may be eliminated from the 
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pre-condition of NFS-abuse entirely. We integrate these simplifications in the 
operational description of the attack presented in the next section. 

5.5 Operational Description 

The full description presented in the previous section may not be the best suit- 
able for an operational implementation of either the detection of the attack or 
the audit of the vulnerability of a computer system to this attack. 

We present now the detailed attack specification that we adopt for attack 
NFS -abuse in this example. Manual modifications of the description provided 
automatically in Sect. 5.4 are done in this operational description. These modi- 
fications are detailed in the following. 

attack N F S -abuse(Target-IP) 

pre : remote-access{A^ iJ) A ip-address{F[, Target-IP) 

A useservice{H,portmapper) A useservice{F[ , mountd) 

A exported jpartition{F[ ^ P) A mounted-partition^Fl ^ P) 

A connected-User{U , H) A userid{U, F[, Userid) 

A useservice{H, fingerd) A root_user{A, Ha) 

A connectedjuser{A, Ha) A owner {Directory , U) 
post : can-access{A, Directory) 
scenario : {{Ei ; {E2 & E3)) & £^4 & E ^) ; Eq 
where action{Ei) = rpcinfo -p Target-IP 
A action{E2) = showmount -e Target-IP 
A action{E^) = showmount -a Target-IP 
A action{E4) = finger ®Target-IP 
A action{E^) = adduser — uid Userid U 
A action{E(,) amount -t nfs P \mnt 
A actor(Ei) = A A actor{E2) = A 
A actor {E3) = A A actor {E4) = A 
A actor{E^) = A A actor(EQ) = A 
detection : {{Fi ; {F2 & F3)) & F4) ; F^ 
where action{Fi) = detect(Ei) 

A action{F2) = detect{E2) 

A action{F^) = detect{E^) 

A action{F4) = detect{E4) 

A action{F^) = detect(EQ) A date{F^) = t 
verification : Wi 

where action{Wi) = f oreignjrnount{) A date{Wi) = t' 

A t<t' 

In the operational description, we see that - even if six attack steps are iden- 
tified - it is desirable that only five detection events Fi, F 2 , F^ be included 
in the description of the attack. Such difference is due to the remote and un- 
detectable nature of attack step E^: the creation of an account on a foreign 
computer. No detection event is associated to E^. Such situation demonstrates 
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the need to provide a separate description of the detection-related or verification- 
related events in the attack description language. The IDS and attacker point of 
view may diverge concerning the events associated to an attack. 

Furthermore, the detection or verification clauses focus on operational 
issues. They may differ from the clauses deduced logically from the low-level 
attacks description due to system observability or to design choices. Detection 
or verification events may be omitted or changed by the language user. 

Such attitude was adopted for the definition of the verification clause in the 
above description. Instead of enumerating the verification events Gi, ..., G5 iden- 
tified in the basic attacks, we introduced a single system verification event Wi 
specific to the description of attack N F S -abuse. In our example, W\ is related 
to an hypothetical test check, denoted f or eign -mount (). This test program is 
supposed to determine the success of the attack by checking directly if a for- 
eign computer (out of the local network) successfully obtained access to some 
local partition of a local computer. Finally, to indicate that f oreign-mount{) 
should be run only after an attack of type N F S -abuse is detected, we introduce 
a constraint concerning the date of the test event Wi with respect to the last 
detection event F5. 

A similar decision could be made with respect to the deduced detection 
events Ff, F2, ..., F5. In fact in practice, for operational or performance reasons, 
most scenario-based IDS do not monitor all the events associated to an attack, 
but only a specific and significant subset which constitutes the signature of the 
attack. The event(s) composing the signature are chosen in order to correspond 
unambiguously to the occurrence of events Fi, F2, ..., F5, but they may be less 
numerous. Practical detection events may even include totally different events, 
for example if user profiles modifications are included among the events men- 
tioned in the detection clause. 

These examples show that a separate description of detection-related or 
verification-related events allows greater flexibility and specialisation of the at- 
tack description with respect to a specific environment. However, the language 
may also be used to deduce these components from lower-level attacks using 
an automatic procedure similar to the one used for the pre-condition and the 
post-condition. 



6 Potential Applications 

The attack descriptions written with the language presented in Sect. 4 may be 
used to perform further analysis of an attack process. 

When an IDS raises an alert due to the occurence of a combination of actions 
associated to a specific attack, it is not always possible to decide if the attack 
failed or succeeded. Verification actions described in the verification clause of 
an attack description may be used to check the success of an occurence of this 
attack by observing its effects. Such verification actions could be triggered by 
the alert generated by the IDS. Like IDS alerts, the results of these checks may 
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not be totally reliable or complete (e.g. if the attacker hide some of the effects 
of the attack, or when the target host availability is compromised). 

The logical formulas describing the conditions and effects of one attack may 
be related to the security properties expected from the system described in the 
security policy. Similarly, the chaining and correlation between two attacks Ai 
and A 2 can be studied in more detail. For example, we can say that Ai and A 2 
are chained if the post-condition of attack Ai logically implies the pre-condition 
of attack A 2 - Two attacks Ai and An are defined as correlated if there exist 
some attacks A 2 , ...,An-i such that Vi S [l,n — 1], Ai and Ai+i are chained.^ 

If the post-condition of an attack A corresponds to a direct violation of the 
properties defined in the security policy, we say that this attack A is malicious 
(see Sect. 2.1). Furthermore, we say that an attack A is a suspicious action if 
attack A is correlated to a malicious action A! Hence, in the above example, 
if the post-condition of A„ corresponds to a violation of the security policy, A„ 
is a malicious action, and Ai, A 2 , An-i are suspicious actions. 

The combination of events appearing in the detection clause of a high-level 
attack “ deduced from the description of lower level attacks - could be used 
to create IDS signatures. If an IDS is configurable enough, such automatically 
generated signatures may enable an IDS to detect multiple steps attacks when 
it can recognise the individual steps composing them. 

Similarly, the events that appear by default in the verification clause of a 
complex attack could be used to drive an audit tool in order to find existing 
vulnerabilities in the computer system based on simpler system checks. 

The logical formulation of the pre-condition and post-condition of attack 
descriptions can be used to build new complex scenario. For example, if two at- 
tacks Ai and A 2 are chained (as defined previously), it is possible to build a new 
high-level attack A based on Ai and H 2 . This allows to consider automatically 
multiple steps attacks based on known low-level steps. 

The information provided by the detection-specific and verification-specific 
parts of an attack description further complements the logical information pro- 
vided by the pre-condition and post-condition. Given a suitable notion of corre- 

® In practice, such a definition of correlation is a strong property, that we call strong 
correlation. Weaker definitions can be adopted. For instance, let us consider the 
attacks rpcinfo{TargetJP) (see step 1) and showmount^eiTargetJP) (see step 2). 
One of the effect of attack rpcinfo{TargetJP) is that the attacker knows that the 
target system uses an NFS daemon: knows{A,C 4 ). But, since condition C 4 appears 
in the pre-condition of attack showmount^eiTargetAP), we can consider that step 1 
and step 2 are correlated. This simulates the following reasoning; the attacker has 
performed step 1 in order to acquire knowledge useful to perform step 2 of the 
N F S -abuse attack. The definitions of weak and strong correlation will be presented 
in a forthcoming paper. 

Therefore, the definition of a suspicious attack depends on the notion of correlation 
adopted. 
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lation between attacks, these various elements may also be used in a deduction 
system to build and check potential intruder plans. Connected with one or sev- 
eral IDS, such deduction module could build some of the possible plans of an 
attacker, using the alerts generated by the IDS and the possible correlations 
between the attacks corresponding to the alerts. Such plans could be revised 
dynamically according to the information later provided by the IDS. Further- 
more, the verification process mentioned in the attack descriptions may be used 
by such deduction module to drive automatic audit tools in order to check the 
absence or the existence of a vulnerability in the computer system [-5,6,9,10]. The 
results could be used to validate or reject possible plans of the attacker, or to 
assess the gravity of an intrusion with respect to the security objectives of the 
target computer system. 

7 Conclusion 

The attack description language we study in this article shares many common 
elements with the languages defined in the USTAT [11] and IDIOT [12] intrusion 
detection systems. USTAT focuses on state transition analysis, whereas IDIOT 
detects intrusion by pattern-matching a signature against audit records. The 
former is based on finite state machine graphs, while the latter uses a variation 
of Coloured Petri Nets. The intrusion detection system IDES [13] relies also on 
a general rule-based expert system to propose a description of the attacks it 
detects, and follows a declarative approach. 

The main difference of the language we propose with these systems resides 
in the integration in the attack description language of specific components ded- 
icated to the description of the intrusion detection and system verification pro- 
cesses. These descriptions are separated from the actual description of the attack 
process. This separation provides additional degrees of freedom to the language 
used to describe the attack signatures by the operational IDS, or the results 
made available by specific audit programs. 

Notice also that a new component may be added to the attack description 
language to describe the possible reactions to a detected attack (either successful 
or not). This reaction may involve several actions. The event algebra described 
previously may be suitable to describe these events. Such events may correspond 
to passive actions, such as the reconfiguration of the IDS or audit programs, or 
active actions, such as connection termination, etc. 

Finally, the description of the attack process itself incorporates information 
concerning the conditions and the effects of the attack using a logical language. 
This language offers the opportunity to build a deduction module that could 
complement the detection capabilities of IDS with respect to multiple steps in- 
trusions and analysis of the behaviour and intentions of an intruder. 
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Abstract. The volume of traffic on security mailing lists, bulletin 
boards, news forums, et cetera has grown so sharply in recent times 
that it is no longer feasible for a systems administrator to follow all rel- 
evant news as a background task; it has become a full-time job. Even 
when relevant information does eventually reach the systems administra- 
tor, there is, often a dangerous window between public knowledge of a 
vulnerability and the administrators ability to correct it. Automated re- 
sponses mechanisms are the key to closing these vulnerability windows. 
We propose a database of likely areas of vulnerability, called targets, 
in a machine readable and filterable manner so that administrators can 
greatly reduce the amount of security mail to be read. We then propose 
a cryptographically secure service with which semi-trusted third parties 
can act in a manner limited by the system administrator, say shutting 
down a specific service while not allowing general access, to diminish the 
window of vulnerability. 



1 Introduction 

Reading of Bugtraq [1] and other sources of security-relevant news and informa- 
tion [4,10,2,9,8], one notices that the amount of information published on a daily 
basis is increasing continuously and rapidly. A few years ago it was possible to 
read and act in response to the complete stream of commonly available security 
news. Doing so was one of the many background tasks done by a competent 
systems administrator. Today, by contrast, it is difficult for any one individual 
even to read all security news. 

Systems administrators are generally quite busy. The result is a dangerous 
window between public announcement of a vulnerability and the system admin- 
istrator’s ability either to update the service, hopefully eliminating the vulnera- 
bility, or disable the service until an update is available. The fact that vulnera- 
bilites are often published at night and over the weekends makes response time 
yet longer. Even security services and compacted security news sources have 
the disadvantage of offering too much information and often have significantly 
longer time delays. Automated response mechanisms can help to reduce these 
time delays thereby effectively making systems more secure. 

It is tempting to equate the danger of this window of vulnerability with the 
physical analog of leaving house or office window unlocked and, accordingly, to 
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consider that the chances of attack quite low. Unfortunately this thinking is in- 
correct. System crackers build large databases of which versions and revisions 
of various software systems are running [6]. It is often possible to use search 
engines to locate certain classes of vulnerability (e.g. the presence of vulnera- 
ble CGI programs). Announcement of a vulnerability is generally followed by 
immediate wide- scale exploitation of vulnerable systems. 

There is often a time delay between discovery of a vulnerability and the 
availability of a patch. For many varieties of system, such as embedded crypto- 
graphic devices, upgrades are prohibitively expensive. In these cases it would be 
useful to be able to trigger an automatic fail-back behavior in case of discovery of 
catastrophic vulnerability (such as cryptanalysis of the underlying algorithms). 
For example, if a block cipher is discovered to be insecure after deployment, 
the scheme could trigger systems to revert to (presumably secure) triple DES 
without costly servicing and down time. 

Additional problems are posed by consumer devices which effectively have 
no systems administrator. Current multi-function consumer devices, such as PCs 
running Windows 98, Macintoshes, and Win CE and PalmOS PDAs, offer no or 
little security. If the next generation of devices are to provide a foundation for 
e-commerce and e-society, significantly higher levels of security will be necessary. 
Automated response mechanisms will play a key role in attaining these levels of 
security. 

We propose the creation of a database of likely areas of vulnerability and 
introduce a scheme with which system administrators can use this database to 
reduce greatly the amount of security mail to be read. We further introduce 
mechanisms to allow systems administrators to automate reactions to vulner- 
ability announcements in a configurable manner based upon the authenticated 
announcements by a semi- trusted third party. 

Although one wants to allow semi-trusted outside agents to alter the behav- 
ior of internal services, one does not wish to enable new attacks. The ability 
to shut down a service is the ability to stage a denial-of-service attack. More 
powerful remote abilities compromise to more powerful attacks. As such, careful 
consideration must be given to enable apoptosis services. 

The simplest form of the idea is publication of a collection of pairs 
(name, token) where token is the image of a nonce secret under a one- way 
hash function. Should a vulnerability be found in name, secret is released into 
the environment to trigger various preconfigured responses (presumably to shut 
down name or to warn a responsible party). 

2 Apoptosis and Naming 

The term apoptosis comes from biology and it refers to the programmed self- 
destruction of cells. Apoptosis is often initiated when a cell detects that it has 
become a threat to the health of the organism, although it has many other 
functions. 
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We propose programmed death for computational services when these ser- 
vices have been fount to threaten the health/security of the entire system (e.g. 
when a vulnerability has been found). Cryptographic de/activation and alter- 
ation of mobile agents is proposed in [7], where the term cluelessness is intro- 
duced. The idea is developed, again in the context of mobile agents, in [11], 
where the biological analogy is explained. 

In apoptosis, unlike necrosis, cells are killed in a controlled manner. Similarly, 
in service apoptosis, services should be shut down in a controlled manner. This 
shutdown would involve sending appropriate warning messages, logging active 
connections, putting up an “out of service” banner, et cetera. 

The important difference between biological and computational apoptosis is 
that the computational variety must be secure against abuse and should maintain 
a limited trust model. In the biological setting, there is no value to abusing an 
apoptosis mechanism. Evil harmful agents, such as viruses, need a cell to function 
properly in order to propogate themselves. In the computational setting, the 
situation is very different. More sophisticated control leads to the possibility of 
more damaging forms of attack. 

The ideal situation, as always, limits trust as much as possible. Use of an 
apoptosis service should not grant general access to system, leak information 
from the system, or even grant knowledge of a system’s existence. Schemes such 
as debian’s [5] automatic udpate mechanism are extremely powerful^ but are 
vulnerable to abuse. 

2.1 Naming 

Experience has demonstrated that one of the most difficult problems in com- 
puter science, and in many other fields, is that of naming [3]. Many services 
have canonical names such as wu-ftpd-2 . 6 . 0 (1) . Unfortunately, security an- 
nouncements related to this daemon might refer to the daemon as any of 

— wu-ftpd-2. 6. 0(1) 

— wu ftpd-2 . 6 . 0 (1) 

— wu ftpd 2 . 6 . 0 (1) 

— wu-ftpd 2. 6. 0(1) 

— wu-ftpd-2 . 6 . 0 . 1 

— WASHINGTON UNIVERSITY FTP SERVER, RELEASE 2. 6. 0(1) 

— WASHINGTON UNIVERSITY ftpd 2. 6. 0(1) 

— et cetera. 

This multiplicity of names makes writing filters that pass on only relevant mes- 
sages to a system administrator very difficult. 

It is moreover the case that many vulnerabilities are not properties of the 
services themselves but of their configuration.^ In these cases, whereas delivery of 

^ The debian package management systems allows to configure a system to upgrade 
its packages in an automated fashion from a trusted system not requiring a system- 
administrators interaction. 

^ We are not speaking of incorrect configuration, which can always turn a service into 
a vulnerable service, but of proper configuration with vulnerable subservices (as with 
the optional mime decoding with sendmail [10]). 



220 James Riordan and Dominique Alessandri 



security (CERT-97-05: CA-97.05.sendmail) announcements is relevant, one does 
not wish to disable the entire service but only relevant subservice. Consideration 
must be made as to an appropriate limited execution environment in which 
detection scripts may be run. This would require further research and is not 
discussed in this work. 

3 Terminology 

A security threat? is any kind of security problem one can encounter on the 
Internet or intranets - including the e-business arena. We distinguish among the 
following levels of security threats: 

— Level 1: Service degradation. Any kind of vulnerability that allows an ad- 
versary to impair the performance of the system significantly. 

— Level 2: Denial of service. Any kind of vulnerability that allows an adversary 
to tamper with the system such that it becomes unavailable. 

— Level 3: Information theft. Any kind of vulnerability that allows an adversary 
to obtain supposedly secret information (privacy issues) . 

— Level 4: Information manipulation. Any kind of vulnerability that allows an 
adversary to manipulate or inject data into a system (integrity issues). 

— Level 5: Control of system. Any kind of vulnerability that allows an adversary 
to execute arbitrary code on a system and therefore to compromise a system. 

An apoptosis service (AS) consists of publishing a list of selected products 
(commercial, any kind of freeware or open source), following closely any available 
publications concerning security issues of the selected products. Once a previ- 
ously unknown security threat has been discovered, this information is published 
in a cryptographically secure manner. 

The apoptosis service provider (ASP) is an organization that selects a list of 
products for which it offers an apoptosis service AS to its customers. 

The apoptosis customer (AC) is the receiver of the AS as it is offered by 
the ASP. The AC aims to protect itself by signing an AS contract with its ASP. 
Consequently the AC is required to semi-trust the ASP as the AC grants the ASP 
the power of actively influencing services offered by the AC. 

An apoptosis activation key (AAK) is a secret random string (nonce) . It should 
be long enough to render brute force attack intractable. 

An apoptosis token (AT) consists of the image of an AAK under a cryptographic 
one-way function Htogether with a complete description of a particular service 
or subservice instance. This description should include version, platform, and 
configuration information. The token will generally be signed to bind the AAK. 

An apoptosis activation token (AAT) consists of the AAK together with a com- 
plete description of a particular service or subservice instance. This description 
should include version, platform, and configuration information. The token will 
generally be signed to bind the AAK. 

^ It is worth mentioning that the list provided below serves as an example to facilitate 
understanding of the remaining document. 
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4 Architecture 

The most basic form of fully functional apoptosis built into a daemon might look 
like: 

// apoptosis token AT 

at = read_my_AT_from_cfg_f ileO ; 

// verify AT signature 
if ( ! verify_sign(at) ) { 

send_warn("AT signature incorrect."); 
exit 0 ; 

} 

// extract activation key hash from AT 
akh = extract_ak_hash(at) ; 

while (true) { 

// receive an apoptosis activation token 
aat = receive_aat () ; 
if ( ! verify_sign(aat) ) f 

send_warning("AAT signature incorrect. \ 

Possible DoS attack."); 

} 

// extract the apoptosis activation key 
aak = extract_aak(aat) ; 
if (hash (aak) == akh) { 
disable_daemon() ; 

send_warning( "Daemon received valid \ 

AAT. Daemon stopped."); 

exit 0 ; 

> 

act_daemonically 0 ; 

} 

As with many password schemes, the important feature is that complete 
knowledge of the above code fragment, and in particular of the AT, does not give 
one the ability to trigger the shutdown behavior due to the one-way nature of 
the one way hash function. One- way hash functions, digital signatures, secret 
sharing, and the constructions from [7] allow us to configure apoptosis services 
according to arbitrary trust models. 

The above configuration places the service in the daemon itself. Naturally 
the services could be implemented in a number of different ways, several of 
which do not require modification of the daemons themselves. A special apoptosis 
service could manage all other daemons on the system. This could naturally be 
combined with the meta-daemon inet. Alternatively, tcp- wrappers [12], rpcbind 
or a subsystem management system could easily be modified to implement such 
functionality. 
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4.1 Apoptosis Token Distribution 

In the interest of privacy and anonymity, the apoptosis customer AC should be 
able to obtain ATs without providing the information about which ATs he or she 
is actually interested in. This can be achieved by any of the following distribution 
channels: 

— Public data repository: Typically the world wide web (http, ftp, et cetera) 
or services similar to antivirus products. In order to guarantee the property 
mentioned above (not providing information about the exact product ver- 
sions installed), the entire list of recently published ATs (including those of 
interest) has to be downloaded by the customer. 

— Public forum: New ATs are published in public forums such as mailing lists 
or news groups. 

— Product vendor: The vendor of a product may provide this service and dis- 
tribute the appropriate ATs along with every product item shipped. If a 
vendor releases a patch or a new version a new AT is supplied as well. 

— Any other broadcast system. 

— An oblivious transfer mechanism. 



4.2 Distribution of Apoptosis Activation Tokens 

The goal of an AAK once it has been released by the ASP is to reach every AC 
subscribed with the ASP. In order to guarantee a secured distribution the AAK will 
be encapsulated in an AAT. The distribution can be achieved by various means: 

— Apoptosis publication protocol: The ASP publishes the AAK in the AAT by a 
special purpose protocol. (This would mean that ASP establishes a connection 
to every of its ACs and transfers the AAK.) 

— Mailing list: The ASP publishes the AAT on a mailing list. These messages 
could then be treated on the AC site in an automated fashion. 

— Use of an existing service: The AAT can be distributed by reusing an existing 
service such as http or ftp. (The AAT could be transferred by means of a 
specific URL or by uploading a specific file.) 

— Polling: The AC queries the ASP on a regular basis for newly published AATs. 
(This introduces a possibly dangerous delay between the publication of an 
AAT and the customers’ reactions.) 

4.3 Customer Apoptosis Functionality 

On the customer side the following functionality must be provided: 

— Reception and verification of AATs, 

— Reaction to AATs, 

— Update of apoptosis configuration. 
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Receipt of AATs There are two different ways to implement apoptosis func- 
tionality on the customer side. In the first approach the apoptosis functionality 
is distributed in the products themselves. This means for instance that a web 
server (e.g. httpd) interprets arriving AATs by itself and takes appropriate ac- 
tion (e.g. shuts itself down). The second approach is to introduce an apoptosis 
subsystem. This subsystem receives AATs by any of the means listed above and 
takes appropriate action such as shutting services down. This subsystem can 
be realized by any combination of means such as a separate daemon, a kernel 
module, a modified inittab, a modified inetd, modified tcp-wrappers, a modified 
rpcbind, et cetera. Before taking any action the receiver of the AAT has to verify 
the ASP’s signature and compute the hash portion of the AT based on the AAK 
received within the AAT, go through its configuration and verify whether any 
actions have been defined for the hash portion of the AT just computed. 



Reaction to AATs The most obvious action to be taken in case of a matching 
AT is to shut down the respective service. However any combination of other 
actions can also be taken: 

— Shut a service down, 

— Reconfigure a service (e.g. disable a certain functionality, et cetera), 

— Install a patch, 

— Alarm the systems administrator, 

— Execute any customized set of commands. 

If the notion of security threat level is used, one can configure separate actions 
depending on the level of the threat to which the AAT is refers. 

It is important to note that the AC is free to configure the manner in which 
its systems react upon the reception of an AAT as desired. The AC may wish to 
configure its systems differently depending on 

~ Their importance for business operations, 

~ The sensitivity of the information they host, 

— The environement in which the systems are operated, 

~ The degree to which ASP is trusted, 

~ Daytime, 

— Weekdays, 

— et cetera. 

The important features are that the reaction is triggered automatically and that 
it is completely customizable by the owner of the system. 



Update of Apoptosis Configuration An update of the apoptosis configura- 
tion is required only if a new product or a new version of a product is installed, 
or the ASP has started to support a product (installed on the customer system) 
it has not been supporting before. The update procedure has to add the AT for a 
specific version of a specific product to the apoptosis configuration. Along with 
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the AT the various actions to be taken in case the appropriate AAT is received 
have to be defined. The update of the apoptosis configuration can be done either 
manually or in an automated fashion. In any case the ASP’s signature should be 
verified first for obvious reasons. 



4.4 Apoptosis Service Provider 

The apoptosis service provider has two main functionalities to offer: 

— Publication of ATs, 

— Publication of AATs. 



Publication of ATs Whenever a vendor releases a new product, a new version 
of a product, or a security- related fix for a product the, ASP has to generate a 
new AAT which is kept secret, and release the new AT to the public. 

Publication of AATs Once the ASP has released a new AT, it commits to re- 
leasing the corresponding AAT to the public in case a security problem in that 
version of a product has been discovered. 

5 Conclusion 

The apoptosis scheme presented can significantly diminish the window of vulner- 
ability of deployed systems, thereby making them more secure. It is arbitrarily 
configurable, which allows the use of one of several apoptosis service providers 
with varying degrees of trust. We feel that its utility of constructions will be 
increasingly useful as computing devices become ever more pervasive and sys- 
tem administrators become increasingly occupied, as with most corporate or 
educational systems, or nonpresent, as with home systems and PDAs. 



References 

1. bugtraq community, bugtraq archive. Mailing list, 
http : //www. securityf ocus . com/. 217 

2. Computer Incident Advisory Capability. Ciac advisory mailing list. Email News 
Bulletin. http://www.ciac.ORG/. 217 

3. Saul A. Kripke. Naming and Necessity. Harvard Universit Press, 1982. ISBN 
0674598466. 219 

4. COAST Lab. Computer operations, audit, and security technology web site. Web 
Site, http://www.cs.purdue.edu/coast/. 217 

5. Debian Security Page. Debian project community. Web Site, 
http : //www. debian. org/security/. 219 

6. James Riordan. Patterns of network intrusion. In Gunter Muller and Kai Rannen- 
berg, editor, Multilateral Security in Communications, Information Security, pages 
173-186. Addison-Wesley, 1999. 218 



Target Naming and Service Apoptosis 225 



7. J. Riordan and B. Sclineier. Environmental key generation towards clueless agents. 
In G. Vigna, editor, Mobile Agents and Security, volume 1419 of LNCS, pages 15- 
24. Springer, 1998. 

http://www.counterpane.com/clueless-agents.html. 219, 221 

8. Bruce Schneier. Cryptogram. Email News Bulletin. 
http://www.counterpane.com/. 217 

9. Australian Computer Emergency Response Team. Auscert news letter. Email 
News Bulletin, http://www.auscert.org.au/. 217 

10. Computer Emergency Response Team. Cert news letter. Email News Bulletin. 
http://www.cert.org/. 217,219 

11. C. Tschudin. Apoptosis - the programmed death of distributed services. In In 
J. Vitek and C. Jensen, editors. Secure Internet Programming - Security Issues for 
Mobile and Distributed Objects, pages 253-260. Springer, 1999. 219 

12. Wietse Venema. tcp-wrappers-7.6. blurb, anonymous FTP. 
ftp : //ftp .porcupine . org/pub/security/. 221 



Author Index 



Alessandri, Dominique .... 183, 217 
Atallah, Mikhail J 1 

Biskup, Joachim 28 

Cuppens, Frederic 197 

Dacier, Marc 110 

Das, Kumar 162 

Debar, Herve 110 

Desai, Pragneshkumar H 49 

Flack, Chapman 1 

Flegel, Ulrich 28 

Fried, David J 162 

Ghosh, Anup 66, 93 

Haines, Joshua W 162 

Heye, Laurent 17 

Korba, Jonathan 162 

Kuri, Josue 17 

Lee, Wenke 49 

Lippmann, Richard 162 



Me, Ludovic 17, 130 

Marrakchi, Zakia 130 

McHugh, John 145 

Michael, Christoph 66, 93 

Morin, Benjamin 130 

Navarro, Gonzalo 17 

Nimbalkar, Rahul A 49 

Ortalo, Rodolphe 197 

Path, Sunil B 49 

Riordan, James 217 

Schatz, Michael 93 

Skinner, Keith 80 

Stolfo, Salvatore J 49 

Tran, Thuan T 49 

Valdes, Alfonso 80 

Vivinis, Bernard 130 

Wespi, Andreas 110 

Yee, Kam K 49 




